1 2004 morgan kaufmann publishers multilevel cache used to reduce miss penalty to main memory first...

30
1 Ó2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed to reduce hit time to be of small size and allow for higher miss rate Usually implemented on the same die as the processor Second level designed to reduce miss rate (miss penalty) to be larger in size Can be on or off-chip (built from SRAMs)

Upload: erik-washington

Post on 01-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

1Ó2004 Morgan Kaufmann Publishers

Multilevel cache

• Used to reduce miss penalty to main memory• First level designed

– to reduce hit time – to be of small size and allow for higher miss rate– Usually implemented on the same die as the

processor

• Second level designed – to reduce miss rate (miss penalty)– to be larger in size– Can be on or off-chip (built from SRAMs)

Page 2: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

2Ó2004 Morgan Kaufmann Publishers

Multilevel cache: example (page 505)

• Processor with base CPI = 1.0 (assuming all references hit in primary cache)

• Clock rate 5 GHz• Main memory access time of 100 ns (including

miss handling)• Miss rate per instruction at primary cache is

2%• How much faster will be the processor if we

add a secondary level cache with access time 5 ns and large enough to reduce miss rate to main memory to 0.5%?

Page 3: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

3Ó2004 Morgan Kaufmann Publishers

Solution

• Using total execution time

Page 4: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

4Ó2004 Morgan Kaufmann Publishers

Cache Complexities

• Not always easy to understand implications of caches:

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

200

400

600

800

1000

1200

64 128 256 512 1024 2048 4096

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

400

800

1200

1600

2000

64 128 256 512 1024 2048 4096

Theoretical behavior of Radix sort vs. Quicksort

Observed behavior of Radix sort vs. Quicksort

Page 5: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

5Ó2004 Morgan Kaufmann Publishers

Cache Complexities

• Here is why:

• Memory system performance is often critical factor– multilevel caches, pipelined processors, make it harder to predict outcomes– Compiler optimizations to increase locality sometimes hurt ILP

• Think of putting instructions that access same data near each other in code leading to data hazards

• Difficult to predict best algorithm: need experimental data

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

1

2

3

4

5

64 128 256 512 1024 2048 4096

Page 6: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

6Ó2004 Morgan Kaufmann Publishers

Virtual memory

Page 7: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Memory Hierarchy

Cache(SRAM)

Main Memory(DRAM)

Disk Storage(Magnetic media)

Page 8: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Issues• DRAM is too expensive to buy gigabytes

– Yet we want our programs to work even if they require more DRAM than we bought.

– We also don’t want a program that works on a machine with 128MB of DRAM to stop working if we try to run it on a machine with only 64MB of main memory.

• We run more than one program on the machine.– The sum of needed memory for all of them

usually exceeds the amount of available memory

– We need to protect the programs from each other

Page 9: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Virtual Memory

• Virtual memory technique: Main memory can act as a cache for the secondary storage (disk)

• Virtual memory is responsible for the mapping of blocks of memory (called pages) from one set of addresses (called virtual addresses) to another set (called physical addresses)

Virtual addresses Physical addresses

Address translation

Disk addresses

Page 10: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Virtual memory advantages• Illusion of having more physical memory

– Keep only active portions of a program in RAM• Program relocation

– Maps virtual address of a program to a physical address in memory

– Put program anywhere in memory • no need to have a single contiguous block of main

memory, program relocated as a set of fixed-size pages • Protection (code and data between simultaneously running

programs)– Each program has its own address space– Virtual memory implements translation of program

address space into physical address while enforcing protection

Page 11: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Virtual memory terminology

• Blocks are called Pages– A virtual address consists of

• A virtual page number• A page offset field (low order bits of the address)

• Misses are call Page faults– and they are generally handled as an

exception

Virtual page number Page offset01131

Page 12: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Page faults

• Page faults: the data is not in memory, retrieve it from disk– huge miss penalty [main memory is about 100,000

times faster than disk], thus pages should be fairly large (4KB to 16 KB)

– reducing page faults is important (LRU is worth the price)

– can handle the faults in software instead of hardware [overhead will be small compared to disk access time]

– using write-through is too expensive so we use write-back

• The structure that holds the information related to the pages (i.e. page is in memory or disk) is called Page Table– Page table is stored in memory

Page 13: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Virtual page number Page offset

31 30 29 28 27 3 2 1 015 14 13 12 11 10 9 8

Physical page number Page offset

29 28 27 3 2 1 015 14 13 12 11 10 9 8

Virtual address

Physical address

Translation

Here the page size is 212 = 4 KB (determined by number of bits in page offset)Number of allowed physical pages = 218

Thus, main memory is at most 1GB (=230) while virtual address space is 4GB (=232)

CPU

( an address in main memory)

Page 14: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Placing a page and finding it again

• High penalty of a page fault necessitates optimizing page placement– Fully associative is attractive as it allows OS to replace any

page by sophisticated LRU algorithms• Full search of main memory is impractical

– Use a page table that indexes the memory and resides in memory

– Page table indexed by page number from the virtual address (no need of tags)

– Page table for each program– Page table register to indicate page table location in

memory– Page table may contain entries not in main memory, rather

on disk

Page 15: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Virtual page number Page offset

3 1 3 0 2 9 2 8 2 7 3 2 1 01 5 1 4 1 3 1 2 11 1 0 9 8

Physical page number Page offset

2 9 2 8 2 7 3 2 1 01 5 1 4 1 3 1 2 11 1 0 9 8

Virtual address

Physical address

Page table register

Physical page numberValid

Page table

If 0 then page is notpresent in memory

20 12

18

Pointer to the starting address of the page table in memory

Here the page size is 212 = 4 KB (determined by number of bits in page offset)Number of allowed physical pages = 218

Thus, main memory is at most 1GB (=230) while virtual address space is 4GB (=232)Number of entries in page table is 220 (very large)

Page 16: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

The OS Role• OS indexes the page table• OS moves the pages in/out of the memory• When a process is created the OS tries to

reserve in the disk enough space for all the pages. This space is called swap area.

Page tablePhysical page or

disk addressPhysical memory

Virtual pagenumber

Disk storage

1111011

11

1

0

0

Valid

Page table maps each page in virtual memory to either a page in main memory or on disk

Page 17: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Problem Consider a virtual memory system with the following properties:• 40-bit virtual address• 36-bit physical address• 16KB page size

What is the total size of the page table for each process on this processor, assuming that the valid, protection, dirty, and use bits take a total of 4 bits, and that all the virtual pages are in use?

Page 18: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Solution• The total size is equal to the number of entries times

the size of each entry.

• Each page is 16 KB, and thus, 14 bits of the virtual and physical address will be used as a page offset.

• The remaining 40 – 14 = 26 bits of the virtual address constitute the virtual page number

• There are thus 226 entries in the page table, one for each virtual page number.

• Each entry requires 36 – 14 = 22 bits to store the physical page number and an additional 4 bits for the valid, protection, dirty, and use bits.

• We round the 26 bits up to a full word per entry, so this gives us a total size of 226 x 32 bits or 256 MB.

Page 19: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Performance of virtual memory

• We must access physical memory to access the page table to make the translation from a virtual address to a physical one

• Then we access physical memory again to get (or store) the data• A load instruction performs at least 2

memory reads• A store instruction performs at least 1

read and then a write.

Page 20: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Translation lookaside buffer

• We fix this performance problem by avoiding main memory in the translation from virtual to physical pages.

• We buffer the common translations in a Translation lookaside buffer (TLB), a fast cache memory dedicated to storing a small subset of valid Virtual-to-Physical translations.

• It is usually put before the cache

Page 21: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

21Ó2004 Morgan Kaufmann Publishers

Main design questions for memory hierarchy

• Where can a block be placed?• How is a block found?• Which block should be replaced on a

miss?• What happens on a write?

Page 22: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

22Ó2004 Morgan Kaufmann Publishers

HW

• HW: 7.10, 7.14, 7.20, 7.32– Due Dec 23

• Section problems: 7.9, 7.12, 7.29, 7.33

Page 23: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Chapters 8

Page 24: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Interfacing Processors and Peripherals ( / محيطى

(خارجى

• I/O Design affected by many factors (expandability, resilience, dependability)

• Performance:– Measured by

• access latency• throughput

– Depends on• connection between devices and the system• the memory hierarchy• the operating system

• A variety of different users (e.g., banks, supercomputers, engineers)

Page 25: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Which performance measure is important?

• Depends on the application– Multimedia applications, most I/O requests are long streams, BW is

important– File tax processing, lots of small I/O requests, need to handle large

number of small I/O requests simultaneously– ATM transactions, both high throughput and short response time

Page 26: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

I/O Devices• Very diverse devices

— behavior (i.e., input vs. output)— partner (who is at the other end? Human/machine)— data rate

Page 27: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

I/O Example: Disk Drives

• To access data:— seek: position head over the proper track (3 to 14 ms. avg.)— rotational latency: wait for desired sector (.5 / RPM)— transfer: grab the data (one or more sectors) 30 to 80

MB/sec

Platter

Track

Platters

Sectors

Tracks

Page 28: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Example page 570

Page 29: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

solution

Page 30: 1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small

Dependability, reliability and availability

• Dependability: fuzzy concept– Needs reference specification

• System alternating between two states:1.Service accomplishment: service delivered as

specified2.Service interruption: delivered service is different

from specified• Transition from 1 to 2 failure• Transition from 2 to 1 restoration