simulations of memory hierarchy lab 2: cache lab
TRANSCRIPT
![Page 1: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/1.jpg)
Simulations of Memory Hierarchy
LAB 2: CACHE LAB
![Page 2: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/2.jpg)
OVERVIEW• Objectives
• Cache Set-Up
• Command line parsing
• Least Recently Used (LRU)
• Matrix Transposition
• Cache-Friendly Code
![Page 3: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/3.jpg)
OBJECTIVE• There are two parts to this lab:
• Part A: Cache Simulator
• Simulate a cache table using the LRU algorithm
• Part B: Optimizing Matrix Transpose
• Write “cache-friendly” code in order to optimize cache hits/misses in the implementation of a matrix transpose function
• When submitting your lab, please submit the handin.tar file as described in the instructions.
![Page 4: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/4.jpg)
MEMORY HIERARCHY• Pick your poison: smaller, faster, and costlier, or larger,
slower, and cheaper
![Page 5: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/5.jpg)
CACHE ADDRESSING• X-bit memory addresses (in Part A, X <= 64 bits)
• Block offset: b bits
• Set index: s bits
• Tag bits: X – b – s
• Cache is a collection of S=2^s cache sets
• Cache set is a collection of E cache lines
• E is the associativity of the cache
• If E=1, the cache is called “direct-mapped”
• Each cache line stores a block of B=2^b bytes of data
![Page 6: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/6.jpg)
ADDRESS ANATOMY
![Page 7: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/7.jpg)
CACHE TABLE BASICS• Conditions:
• Set size (S)
• Block size (B)
• Line size (E)
• Note that the total capacity of this cache would be S*B*E
• Blocks are the fundamental units of the cache
![Page 8: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/8.jpg)
CACHE TABLE CORRESPONDENCE WITH ADDRESS
![Page 9: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/9.jpg)
Example for 32 bit address
![Page 10: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/10.jpg)
CACHE SET LOOK-UP• Determine the set index and the tag bits based on the
memory address
• Locate the corresponding cache set and determine whether or not there exists a valid cache line with a matching tag
• If a cache miss occurs:
• If there is an empty cache line, utilize it
• If the set is full then a cache line must be evicted
![Page 11: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/11.jpg)
TYPES OF CACHE MISSES• Compulsory Miss:
• First access to a block has to be a miss
• Conflict Miss:
• Level k cache is large enough, but multiple data objects all map to the same level k block
• Capacity Miss:
• Occurs when the working set of blocks (blocks of memory being used) is larger than the cache
![Page 12: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/12.jpg)
PART A:CACHE SIMULATION
![Page 13: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/13.jpg)
YOUR OWN CACHE SIMULATOR• NOT a real cache
• Block offsets are NOT used but are important in understanding the concept of a cache
• s, b, and E given at runtime
![Page 14: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/14.jpg)
FUNCTIONS TO USE FOR COMMAND LINE PARSING• int getopt(int argc, char*const* argv, const char*
options)
• See: http://www.gnu.org/software/libc/manual/html_node/Example-of-Getopt.html#Example-of-Getopt
• long long int strtoll(const char* str, char** endptr, int base)
• See: http://www.cplusplus.com/reference/cstdlib/strtoll/
![Page 15: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/15.jpg)
LEAST RECENTLY USED (LRU) ALGORITHM
• A least recently used algorithm should be used to determine which cache lines to evict in what order
• Each cache line will need some sort of “time” field which should be update each time that cache line is referenced
• If a cache miss occurs in a full cache set, the cache line with the least relevant time field should be evicted
![Page 16: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/16.jpg)
PART B:OPTIMIZING MATRIX TRANSPOSE
![Page 17: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/17.jpg)
WHAT IS A MATRIX TRANSPOSITION?
• The transpose of a matrix A is denoted as AT
• The rows of AT are the columns of A, and the columns of AT are the rows of A
• Example:
![Page 18: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/18.jpg)
GENERAL MATRIX TRANSPOSITION
![Page 19: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/19.jpg)
CACHE-FRIENDLY CODE• In order to have fewer cache misses, you must make
good use of:
• Temporal locality: reuse the current cache block if possible (avoid conflict misses [thrashing])
• Spatial locality: reference the data of close storage locations
• Tips:
• Cache blocking
• Optimized access patterns
• Your code should look ugly if done correctly
![Page 20: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/20.jpg)
CACHE BLOCKING• Partition the matrix in question into sub-matrices
• Divide the larger problem into smaller sub-problems
• Main idea:
• Iterate over blocks as you perform the transpose as opposed to the simplistic algorithm which goes index by index, row by row
• Determining the size of these blocks will take some amount of thought and experimentation
![Page 21: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/21.jpg)
QUESTIONS TO PONDER• What would happen if instead of accessing each index in row
order you alternated with jumping from row to row within the same column?
• What would happen if you declared only 4 local variables as opposed to 12 local variables?
• Is it possible to get rid of the local variables all together?
• What happens when accessing elements along the diagonal?
• What happens when the program is run in a different directory?
![Page 22: Simulations of Memory Hierarchy LAB 2: CACHE LAB](https://reader035.vdocuments.site/reader035/viewer/2022062313/56649cf95503460f949ca85c/html5/thumbnails/22.jpg)
(XKCD)