1 1999 ©ucb cs 161 ch 7: memory hierarchy lecture 16 instructor: l.n. bhuyan bhuyan

23
1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan

Post on 20-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

1 1999 ©UCB

CS 161Ch 7: Memory Hierarchy

LECTURE 16

Instructor: L.N. Bhuyanwww.cs.ucr.edu/~bhuyan

Page 2: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

2 1999 ©UCB

With Load Bypass:

Average Access Time = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate

OR

Without Load Bypass

Average Memory Acess Time = Time for a hit + Miss rate x Miss penalty

Cache Access Time

Page 3: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

3 1999 ©UCB

Unified vs Split Caches

Unified Cache: • Low Miss ratio because more space available for either instruction or data• Low cache bandwidth because instruction and data cannot be read at the same time

due to one port.

Split Cache:• High miss ratio because either instructions or data may run out of space even though

space is available at other cache• High bandwidth because an instruction and data can be accessed at the same time.

Example:• 16KB I&D: Inst miss rate=0.64%, Data miss rate=6.47%• 32KB unified: Aggregate miss rate=1.99%

° Which is better (ignore L2 cache)?• Assume 33% data ops 75% accesses from instructions (1.0/1.33)• hit time=1, miss time=50• Note that data hit has 1 stall for unified cache (only one port)

AMATHarvard=75%x(1+0.64%x50)+25%x(1+6.47%x50) = 2.05AMATUnified=75%x(1+1.99%x50)+25%x(1+1+1.99%x50)= 2.24

ProcI-Cache-1

Proc

UnifiedCache-1

UnifiedCache-2

D-Cache-1

Proc

UnifiedCache-2

Page 4: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

4 1999 ©UCB

Static RAM (SRAM)°Six transistors in cross connected fashion

• Provides regular AND inverted outputs

• Implemented in CMOS process

Single Port 6-T SRAM Cell

Page 5: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

5 1999 ©UCB

Dynamic Random Access Memory - DRAM° DRAM organization is similar to SRAM except that

each bit of DRAM is constructed using a pass transistor and a capacitor, shown in next slide

° Less number of transistors/bit gives high density, but slow discharge through capacitor.

° Capacitor needs to be recharged or refreshed giving rise to high cycle time. Q: What is the difference between access time and cycle time?

° Uses a two-level decoder as shown later. Note that 2048 bits are accessed per row, but only one bit is used.

Page 6: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

6 1999 ©UCB

° SRAM cells exhibit high speed/poor density

° DRAM: simple transistor/capacitor pairs in high density form

Dynamic RAM

Word Line

Bit Line

C

Sense Amp

.

.

.

Page 7: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

7 1999 ©UCB

DRAM logical organization (4 Mbit)

° Square root of bits per RAS/CAS

Column Decoder

Sense Amps & I/O

Memory Array(2,048 x 2,048)

A0…A10

11

D

Q

Word LineStorage CellR

ow D

ecod

er…

•Access time of DRAM = Row access time + column access time + refreshing

Page 8: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

8 1999 ©UCB

Virtual Memory° Idea 1: Many Programs sharing DRAM Memory so that context switches can occur

° Idea 2: Allow program to be written without memory constraints – program can exceed the size of the main memory

° Idea 3: Relocation: Parts of the program can be placed at different locations in the memory instead of a big chunk.

° Virtual Memory:

(1) DRAM Memory holds many programs running at same time (processes)

(2) use DRAM Memory as a kind of “cache” for disk

Page 9: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

9 1999 ©UCB

Disk Technology in Brief°Disk is mechanical memory

°Disk Access Time =seek time + rotational delay + transfer time

• usually measured in milliseconds

°“Miss” to disk is extremely expensive• typical access time = millions of clock cycles

3600 - 7200 RPM rotation speed

tracks

R/W arm

Page 10: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

10 1999 ©UCB

Virtual Memory has own terminology° Each process has its own private “virtual address space” (e.g., 232 Bytes); CPU actually generates “virtual addresses”

° Each computer has a “physical address space” (e.g., 128 MegaBytes DRAM); also called “real memory”

° Address translation: mapping virtual addresses to physical addresses

• Allows multiple programs to use (different chunks of physical) memory at same time

• Also allows some chunks of virtual memory to be represented on disk, not in main memory (to exploit memory hierarchy)

Page 11: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

11 1999 ©UCB

Mapping Virtual Memory to Physical Memory

0

Physical Memory

Virtual Memory

Heap

64 MB

° Divide Memory into equal sized“chunks” (say, 4KB each)

0

° Any chunk of Virtual Memory assigned to any chunk of Physical Memory (“page”)

Stack

Heap

Static

Code

SingleProcess

Page 12: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

12 1999 ©UCB

Handling Page Faults°A page fault is like a cache miss

• Must find page in lower level of hierarchy

° If valid bit is zero, the Physical Page Number points to a page on disk

°When OS starts new process, it creates space on disk for all the pages of the process, sets all valid bits in page table to zero, and all Physical Page Numbers to point to disk

• called Demand Paging - pages of the process are loaded from disk only as needed

Page 13: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

13 1999 ©UCB

Comparing the 2 levels of hierarchy° Cache Virtual Memory

° Block or Line Page

° Miss Page Fault

° Block Size: 32-64B Page Size: 4K-16KB

° Placement: Fully AssociativeDirect Mapped, N-way Set Associative

° Replacement: Least Recently UsedLRU or Random (LRU) approximation

° Write Thru or Back Write Back

° How Managed: Hardware + SoftwareHardware (Operating System)

Page 14: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

14 1999 ©UCB

How to Perform Address Translation?

°VM divides memory into equal sized pages

°Address translation relocates entire pages

• offsets within the pages do not change

• if make page size a power of two, the virtual address separates into two fields:

• like cache index, offset fields

Virtual Page Number Page Offset virtual address

Page 15: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

15 1999 ©UCB

Mapping Virtual to Physical Address

Virtual Page Number Page Offset

Page OffsetPhysical Page Number

Translation

31 30 29 28 27 .………………….12 11 10

29 28 27 .………………….12 11 10

9 8 ……..……. 3 2 1 0

Virtual Address

Physical Address

9 8 ……..……. 3 2 1 0

1KB page size

Page 16: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

16 1999 ©UCB

Address Translation°Want fully associative page placement

°How to locate the physical page?

°Search impractical (too many pages)

°A page table is a data structure which contains the mapping of virtual pages to physical pages

• There are several different ways, all up to the operating system, to keep this data around

°Each process running in the system has its own page table

Page 17: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

17 1999 ©UCB

Address Translation: Page Table

Virtual Address (VA):virtual page nbr offset

Page TableRegister

Page Table is located in physical memory

indexintopagetable

+

PhysicalMemory

Address (PA)

Access Rights: None, Read Only, Read/Write, Executable

Page Table

Val-id

AccessRights

PhysicalPageNumber

V A.R. P. P. N.0 A.R.

V A.R. P. P. N.

...

...

disk

Page 18: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

18 1999 ©UCB

Optimizing for Space°Page Table too big!

• 4GB Virtual Address Space ÷ 4 KB page 220 (~ 1 million) Page Table Entries 4 MB just for Page Table of single process!

°Variety of solutions to tradeoff Page Table size for slower performance

° Use a limit register to restrict page table size and let it grow with more pages,Multilevel page table, Paging page tables, etc.

(Take O/S Class to learn more)

Page 19: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

19 1999 ©UCB

How to Translate Fast?° Problem: Virtual Memory requires two memory accesses!• one to translate Virtual Address into Physical

Address (page table lookup)• one to transfer the actual data (cache hit)• But Page Table is in physical memory!

° Observation: since there is locality in pages of data, must be locality in virtual addresses of those pages!

° Why not create a cache of virtual to physical address translations to make translation fast? (smaller is faster)

° For historical reasons, such a “page table cache” is called a Translation Lookaside Buffer, or TLB

Page 20: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

20 1999 ©UCB

Typical TLB Format

Virtual Physical Valid Ref Dirty AccessPage Nbr Page Nbr Rights

•TLB just a cache of the page table mappings

• Dirty: since use write back, need to know whether or not to write page to disk when replaced• Ref: Used to calculate LRU on replacement

• TLB access time comparable to cache (much less than main memory access time)

“tag” “data”

Page 21: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

21 1999 ©UCB

Translation Look-Aside Buffers

•TLB is usually small, typically 32-4,096 entries

• Like any other cache, the TLB can be fully associative, set associative, or direct mapped

Processor TLB Cache MainMemory

misshit

data

hit

miss

DiskMemory

OS FaultHandler

page fault/protection violation

PageTable

data

virtualaddr.

physicaladdr.

Page 22: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

22 1999 ©UCB

Valid Tag Data

Page offset

Page offset

Virtual page number

Physical page numberValid

1220

20

16 14

Cache index

32

DataCache hit

2Byte

offset

Dirty Tag

TLB hit

Physical page number

Physical address tag

31 30 29 15 14 13 12 11 10 9 8 3 2 1 0 DECStation 3100/MIPS R2000Virtual Address

TLB

Cache

64 entries, fully

associative

Physical Address

16K entries,

direct mapped

Page 23: 1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan bhuyan

23 1999 ©UCB

Real Stuff: Pentium Pro Memory Hierarchy° Address Size: 32 bits (VA, PA)

° VM Page Size: 4 KB, 4 MB

° TLB organization: separate i,d TLBs(i-TLB: 32 entries, d-TLB: 64 entries)4-way set associativeLRU approximatedhardware handles

miss

° L1 Cache: 8 KB, separate i,d 4-way set associative

LRU approximated32 byte blockwrite back

° L2 Cache: 256 or 512 KB