1 2004 morgan kaufmann publishers multilevel cache used to reduce miss penalty to main memory first...

1Ó2004 Morgan Kaufmann Publishers

Multilevel cache

• Used to reduce miss penalty to main memory• First level designed

– to reduce hit time – to be of small size and allow for higher miss rate– Usually implemented on the same die as the

processor

• Second level designed – to reduce miss rate (miss penalty)– to be larger in size– Can be on or off-chip (built from SRAMs)


Multilevel cache: example (page 505)

• Processor with base CPI = 1.0 (assuming all references hit in primary cache)

• Clock rate 5 GHz• Main memory access time of 100 ns (including

miss handling)• Miss rate per instruction at primary cache is

2%• How much faster will be the processor if we

add a secondary level cache with access time 5 ns and large enough to reduce miss rate to main memory to 0.5%?


Solution

• Using total execution time


Cache Complexities

• Not always easy to understand implications of caches:

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

200

400

600

800

1000

1200

64 128 256 512 1024 2048 4096

Radix sort

Quicksort


04 8 16 32

400

800

1200

1600

2000

64 128 256 512 1024 2048 4096

Theoretical behavior of Radix sort vs. Quicksort

Observed behavior of Radix sort vs. Quicksort


Cache Complexities

• Here is why:

• Memory system performance is often critical factor– multilevel caches, pipelined processors, make it harder to predict outcomes– Compiler optimizations to increase locality sometimes hurt ILP

• Think of putting instructions that access same data near each other in code leading to data hazards

• Difficult to predict best algorithm: need experimental data

Radix sort

Quicksort


04 8 16 32

1

2

3

4

5

64 128 256 512 1024 2048 4096


Virtual memory

Memory Hierarchy

Cache(SRAM)

Main Memory(DRAM)

Disk Storage(Magnetic media)

Issues• DRAM is too expensive to buy gigabytes

– Yet we want our programs to work even if they require more DRAM than we bought.

– We also don’t want a program that works on a machine with 128MB of DRAM to stop working if we try to run it on a machine with only 64MB of main memory.

• We run more than one program on the machine.– The sum of needed memory for all of them

usually exceeds the amount of available memory

– We need to protect the programs from each other

Virtual Memory

• Virtual memory technique: Main memory can act as a cache for the secondary storage (disk)

• Virtual memory is responsible for the mapping of blocks of memory (called pages) from one set of addresses (called virtual addresses) to another set (called physical addresses)

Virtual addresses Physical addresses

Address translation

Disk addresses

Virtual memory advantages• Illusion of having more physical memory

– Keep only active portions of a program in RAM• Program relocation

– Maps virtual address of a program to a physical address in memory

– Put program anywhere in memory • no need to have a single contiguous block of main

memory, program relocated as a set of fixed-size pages • Protection (code and data between simultaneously running

programs)– Each program has its own address space– Virtual memory implements translation of program

address space into physical address while enforcing protection

Virtual memory terminology

• Blocks are called Pages– A virtual address consists of

• A virtual page number• A page offset field (low order bits of the address)

• Misses are call Page faults– and they are generally handled as an

exception

Virtual page number Page offset01131

Page faults

• Page faults: the data is not in memory, retrieve it from disk– huge miss penalty [main memory is about 100,000

times faster than disk], thus pages should be fairly large (4KB to 16 KB)

– reducing page faults is important (LRU is worth the price)

– can handle the faults in software instead of hardware [overhead will be small compared to disk access time]

– using write-through is too expensive so we use write-back

• The structure that holds the information related to the pages (i.e. page is in memory or disk) is called Page Table– Page table is stored in memory

Virtual page number Page offset

31 30 29 28 27 3 2 1 015 14 13 12 11 10 9 8

Physical page number Page offset

29 28 27 3 2 1 015 14 13 12 11 10 9 8

Virtual address

Physical address

Translation

Here the page size is 212 = 4 KB (determined by number of bits in page offset)Number of allowed physical pages = 218

Thus, main memory is at most 1GB (=230) while virtual address space is 4GB (=232)

CPU

( an address in main memory)

Placing a page and finding it again

• High penalty of a page fault necessitates optimizing page placement– Fully associative is attractive as it allows OS to replace any

page by sophisticated LRU algorithms• Full search of main memory is impractical

– Use a page table that indexes the memory and resides in memory

– Page table indexed by page number from the virtual address (no need of tags)

– Page table for each program– Page table register to indicate page table location in

memory– Page table may contain entries not in main memory, rather

on disk

Virtual page number Page offset

3 1 3 0 2 9 2 8 2 7 3 2 1 01 5 1 4 1 3 1 2 11 1 0 9 8

Physical page number Page offset

2 9 2 8 2 7 3 2 1 01 5 1 4 1 3 1 2 11 1 0 9 8

Virtual address

Physical address

Page table register

Physical page numberValid

Page table

If 0 then page is notpresent in memory

20 12

18

Pointer to the starting address of the page table in memory

Here the page size is 212 = 4 KB (determined by number of bits in page offset)Number of allowed physical pages = 218

Thus, main memory is at most 1GB (=230) while virtual address space is 4GB (=232)Number of entries in page table is 220 (very large)

The OS Role• OS indexes the page table• OS moves the pages in/out of the memory• When a process is created the OS tries to

reserve in the disk enough space for all the pages. This space is called swap area.

Page tablePhysical page or

disk addressPhysical memory

Virtual pagenumber

Disk storage

1111011

11

1

0

0

Valid

Page table maps each page in virtual memory to either a page in main memory or on disk

Problem Consider a virtual memory system with the following properties:• 40-bit virtual address• 36-bit physical address• 16KB page size

What is the total size of the page table for each process on this processor, assuming that the valid, protection, dirty, and use bits take a total of 4 bits, and that all the virtual pages are in use?

Solution• The total size is equal to the number of entries times

the size of each entry.

• Each page is 16 KB, and thus, 14 bits of the virtual and physical address will be used as a page offset.

• The remaining 40 – 14 = 26 bits of the virtual address constitute the virtual page number

• There are thus 226 entries in the page table, one for each virtual page number.

• Each entry requires 36 – 14 = 22 bits to store the physical page number and an additional 4 bits for the valid, protection, dirty, and use bits.

• We round the 26 bits up to a full word per entry, so this gives us a total size of 226 x 32 bits or 256 MB.

Performance of virtual memory

• We must access physical memory to access the page table to make the translation from a virtual address to a physical one

• Then we access physical memory again to get (or store) the data• A load instruction performs at least 2

memory reads• A store instruction performs at least 1

read and then a write.

Translation lookaside buffer

• We fix this performance problem by avoiding main memory in the translation from virtual to physical pages.

• We buffer the common translations in a Translation lookaside buffer (TLB), a fast cache memory dedicated to storing a small subset of valid Virtual-to-Physical translations.

• It is usually put before the cache


Main design questions for memory hierarchy

• Where can a block be placed?• How is a block found?• Which block should be replaced on a

miss?• What happens on a write?


HW

• HW: 7.10, 7.14, 7.20, 7.32– Due Dec 23

• Section problems: 7.9, 7.12, 7.29, 7.33

Chapters 8

Interfacing Processors and Peripherals ( / محيطى

(خارجى

• I/O Design affected by many factors (expandability, resilience, dependability)

• Performance:– Measured by

• access latency• throughput

– Depends on• connection between devices and the system• the memory hierarchy• the operating system

• A variety of different users (e.g., banks, supercomputers, engineers)

Which performance measure is important?

• Depends on the application– Multimedia applications, most I/O requests are long streams, BW is

important– File tax processing, lots of small I/O requests, need to handle large

number of small I/O requests simultaneously– ATM transactions, both high throughput and short response time

I/O Devices• Very diverse devices

— behavior (i.e., input vs. output)— partner (who is at the other end? Human/machine)— data rate

I/O Example: Disk Drives

• To access data:— seek: position head over the proper track (3 to 14 ms. avg.)— rotational latency: wait for desired sector (.5 / RPM)— transfer: grab the data (one or more sectors) 30 to 80

MB/sec

Platter

Track

Platters

Sectors

Tracks

Example page 570

solution

Dependability, reliability and availability

• Dependability: fuzzy concept– Needs reference specification

• System alternating between two states:1.Service accomplishment: service delivered as

specified2.Service interruption: delivered service is different

from specified• Transition from 1 to 2 failure• Transition from 2 to 1 restoration

1 2004 morgan kaufmann publishers multilevel cache used to reduce miss penalty to main memory first...

Documents

mb of main memory

memory system performance

sum of needed memory

ghzmain memory access

pagesa virtual address

virtual addresses

miss rate miss penaltyto

memoryput program