memory management cs 3723 operating systems: memory ...€¦ · 1 1 cs 3723 operating systems:...

1

1

CS 3723 Operating Systems: Memory Management (SGG-08)

Department of Computer Science @ UTSA

Instructor: Dr. Tongping Liu

Thank Dr. Dakai Zhu and Dr. Palden Lama for providing their slides.

Memory Management

❚  Background"❚  Swapping "❚  Contiguous Memory Allocation and Fragmentation"❚  Paging"❚  Structure of the Page Table"❚  TLB"

Objectives

❚  To provide a detailed description of various ways of organizing memory hardware"

❚  To discuss various memory-management techniques, including paging and segmentation"

❚  To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging"

n  CPU can directly access main memory and registers only

n  But, programs and data are stored in disks. So, program and data must be brought (from disk) into memory

n  Memory accesses can be bottleneck l  Cache between memory and CPU registers

n  Memory Hierarchy"l  Cache: small, fast, expensive; SRAM"l  Main memory: medium-speed; DRAM"l  Disk: slow, cheap, non-volatile storage""

Internal Memory

I/O Devices (disks)

CPU (Processor/ALU)

Background

Background

❚  Think memory as an array of words containing program instructions and data"

❚  How do we execute a program?"Ø Fetch an instruction à decode à may fetch operands à execute à may store results!

❚  Memory unit sees a stream of ADDRESSES"

❚  How to manage and protect main memory while sharing it among multiple processes?"Ø Keeping multiple process in memory is essential to

improving the CPU utilization"

Simple one: Base and Limit Registers

❚  Memory protection \is required to ensure correct operation"❚  A pair of base and limit registers define the logical address

space of a process. Every memory access is checked by hardware. Any problem? "Too slow

2

How to Bind Instructions and Data to Memory

"❚  Address binding of instructions and data

to memory addresses can happen at three different stages"Ø Compile time: If memory location is known

beforehand, absolute code can be generated; must recompile code if starting location changes (e.g. DOS .com programs)"

Ø Load time: Compiler generates relocatable code, final binding occurred at load time "

Ø Execution time: Binding delayed until execution time if the process can be moved during its execution from one memory segment to another!

Logical vs. Physical Address Space

❚  Logical address "Ø generated by the CPU; also referred to as virtual address!

❚  Physical address "Ø address seen by the memory unit"

❚  Logical and physical addresses are the same in compile-time and load-time address-binding schemes; "

❚  Logical (virtual) and physical addresses differ in execution-time address-binding scheme !

Ø The mapping form logical address to physical address is done by a hardware called a memory management unit (MMU)."

Ø We will mainly study how this mapping is done and what hardware support is needed"

Memory-Management Unit (MMU) ❚  Hardware device that maps logical

(virtual) address to physical address"❚  In a simple MMU, the value in the

relocation register (base) is added to every address generated by a user process at the time it is sent to memory"

❚  The user program deals" with logical addresses, " not real physical addresses"

Dynamic relocation using a relocation register

Memory-Management Unit (MMU) ❚  logical addresses (in the range

0~max) and physical addresses (in the range R+0 to R+max for a base value R).

❚  The user program generates only logical addresses and thinks that the process runs in locations 0 to max.

❚  logical addresses must be mapped to physical addresses before they are used"

Dynamic relocation using a relocation register

Dynamic Loading

❚  Why dynamic loading?"Ø Without this, the size of a process is limited to that of

physical memory"❚  Dynamic loading:"

Ø Dynamically load routines when they are called"Ø All other routines are kept on disk in a loadable format"

❚  Better memory-space utilization since unused routines are never loaded"

❚  Useful when large amounts of code are needed to handle infrequently occurring cases like error handling "

Dynamic Linking and Shared Libraries

❚  Static linking"Ø  System language and library routines are included in the binary code"

❚  Dynamic linking: similar to dynamic loading"Ø Linking postponed until execution time"Ø Without that, every library will have a copy in the executable

file, wasting both disk space and memory"Ø A stub is used to locate and load the appropriate memory-

resident library routine, when the routine is not existing "Ø If it is existing, no need for loading again (PLT)"

❚  Dynamic linking is particularly useful for libraries "!(one copy, transparent updates)!

3

Memory Management


Swapping

Reject it! But if you want to support more processes, "

"Swap out an old process to a disk""(waiting for long I/O, quantum expired etc)"

OS OS OS OS OS OS OSA A

BABC

BC

BC

D

C

D

C

DA

What if no free region is big enough?

.

Consider a multi-programming environment:l  Each program must be in the memory to be executed l  Processes come into memory and l  Leave memory when execution is completed

D

Swapping ❚  A process can be swapped

temporarily out of memory to a backing store, and then brought back into memory for continued execution "Ø  Backing store – large enough to

accommodate copies of all memory images for all users; must provide direct access to these memory images"

Ø  Roll out, roll in – swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed"

❚  Swapping can free up memory for additional processes. "

Swapping (cont’d)

❚  Major part of swap time is transfer time; "Ø Total transfer time is directly proportional to the amount of

memory swapped (e.g., 10MB process / 40MB per sec = 0.25 sec)"

Ø May take too much time to be used often "❚  Standard swapping requires too much swapping time"❚  Modified versions of swapping are found on many

systems (i.e., UNIX, Linux, and Windows), but it is often disabled !

Memory Management


Contiguous Allocation

❚  Main memory is usually divided into two partitions:"Ø Resident operating system, usually held in low memory"

Ø User processes, usually held in high memory"n  Relocation registers are used to

protect user processes from each other, and from changing operating-system code and data"l  MMU maps logical address

to physical addresses dynamically!

l  But the physical addresses should be contiguous !

4

Contiguous Allocation (Cont)

❚  Multiple-partition allocation"Ø Hole – block of available memory; "

ü holes of various size are scattered throughout memory"

Ø When a process arrives, OS allocates memory from a hole large enough to accommodate it"

Ø Operating system maintains information about:a) allocated partitions b) free partitions (holes)"

"OS"

process 5"

process 8"

process 2"

OS"

process 5"

process 2"

OS"

process 5"

process 2"

OS"

process 5"process 9"

process 2"

process 9"

process 10"

OS"

process 9"

process 2"

process 10"

.

Dynamic Storage-Allocation Problem

❚  First-fit: Allocate the first hole that is big enough"❚  Best-fit: Allocate the smallest hole that is big enough; "

Ø Must search entire list, unless ordered by size "Ø Produces the smallest leftover hole"

❚  Worst-fit: Allocate the largest hole; "Ø Must also search entire list "Ø Produces the largest leftover hole""

❚  First-fit and best-fit are better than worst-fit in terms of speed and storage utilization. But all suffer from fragmentation "

How to satisfy a request of size n from a list of free holes"

Fragmentation ❚  External Fragmentation "

Ø total memory space exists to satisfy a request, but it is not contiguous"

❚  Internal Fragmentation "Ø allocated memory may be slightly larger than

requested memory; "Ø this size difference is internal fragmentation"

❚  How can we reduce external fragmentation "Ø Compaction: Move memory to place all free

memory together in one large block, possible only if relocation is dynamic and is done at execution time"

Ø I/O problem à large overhead!

How about not requiring programs to be loaded contiguously?

OS"

process 9"

process 2"

process 10"

Memory Management


Paging: Basic Ideas

Page Table

0–4K

4–8K

8–12K

12–16K

16–20K

20–24K

24–28K

28–32K

1

3

0

2

Physical memory

2

0–4K

4–8K

8–12K

12–16K

0

3

Logical memory

1

A page is mapped to a frame❚  Divide physical memory into fixed-sized blocks called frames!Ø  Size is power of 2, between 512 bytes and 16MB or more"

❚  Divide logical memory into blocks of same size called pages"

❚  To run a program with size n pages, we need n free frames "

❚  Set up a page table to translate logical to physical addresses"Ø User sees memory a contiguous

space (0 to MAX) but OS does not need to allocate it this way"

0 5

1 1

2 7

3 3

.

Paging: Internal Fragmentation ❚  Calculating internal fragmentation"

Ø Page size = 2,048 bytes"Ø Process size = 72,766 bytes"Ø 35 pages + 1,086 bytes"Ø Internal fragmentation of 2,048 - 1,086 = 962 bytes"Ø Worst case fragmentation = 1 frame – 1 byte"Ø On average fragmentation = 1 / 2 frame size"

❚  So small frame sizes desirable? à more entries"Ø Each page table takes memory to track"

❚  Page sizes growing over time"Ø Solaris supports two page sizes – 8 KB and 4 MB"

5

Address Translation n  Suppose the logical address space is 2m and page size is 2n ""so the number of pages is 2m / 2 n , which is 2m-n "

❚  Logical Address (m bits) is divided into:"Ø Page number (p) – used as an index into a page table which

contains base address of each page in physical memory"Ø Page offset (d) – combined with base address to define the

physical memory address that is sent to the memory unit"

page number" page offset"

p" d"

m - n! n!

Paging Hardware

Paging Example Suppose the page size is 4-byte pages."

What would you say about the size of logical address space, the size of physical address space, and the size of the page table?"

Shared Pages

❚  Shared code!Ø One copy of read-only (reentrant) code

shared among processes (i.e., text editors, compilers, window systems)."

Ø Shared code must appear in same location in the logical address space of all processes"

❚  Private code and data "Ø Each process keeps a separate copy of the

code and data"Ø The pages for the private code and data can

appear anywhere in the logical address space"

Shared Pages Example Free Frames

Before allocation" After allocation"

6

Valid (v) or Invalid (i) Bit In A Page Table Page Table Entry: Protection & Other Bits

❚  Page Table Entry (PTE): Information Needed"Ø Valid-invalid bit attached for memory protection"

ü “valid” indicates that the associated page is a legal page that is in memory!ü “invalid” indicates that the page is not in memory (or not legal for protection)"

Ø Referenced bit: set if the page has been accessed recently"Ø Dirty (modified) bit: set if data on the page has been

modified"Ø Protection information: read-only/writable/executable or not"

❚  Size of each PTE: frame number plus several bits"Ø Number of bits: power of 2 à if needs 24 bits, use 32b (4B)"

Frame numberVRDProtection

Valid bit Referenced bit Dirty bit

Size of A Page/Frame: An Example ❚  Example: "

Ø Logical (Virtual) address: 22 bits à 4 MB"Ø Physical address (bus width): 20 à 1MB"

Page/frame size: 1KB, requiring 10 bits " P#:12 bits Offset: 10 bits

F#:10 bits Offset: 10 bits

Virtual addr:

Physical addr:

What are the problems with

this option?

.

Ø Number of pages: 12 bits à 4K pages Ø Number of frames: 10 bits à 1K frames Ø Size of page table:

§  Each page entry must have at least 10 bits à 2 bytes §  4K * 2 bytes = 8KB à requires 8 consecutive frames

Size of Page/Frame: How Big?

❚  Determined by number of bits in offset (512Bà16KB and become larger)!

❚  Smaller pages "Ø + Less internal fragmentation"Ø - Too large page table: spin over more than one frame (need

to be continuous due to index), hard to allocate!"❚  Larger pages "

Ø Too small page table so less overhead to keep track of them"Ø More efficient to transfer larger pages to/from disk "Ø - More internal fragmentation; waste of memory"

❚  Design principle: fit page table into one frame!Ø If not, multi-level paging (discussed later)"

.

Implementation of Page Table

❚  Where should we store Page table? "Ø Registers (fast efficient but limited), main memory, dedicated lookup

tables"❚  Memory"

Ø Page-table base register (PTBR) points to the page table in memory"

Ø Page-table length register (PTLR) indicates size of the page table"

❚  Every data/instruction access requires two memory accesses"Ø One for the page table and one for the data/instruction"

Translation Look-ahead Buffers (TLB)

❚  Translation Look-ahead buffers (TLB)"Ø A special fast-lookup hardware cache: associative memory!Ø Access by content à parallel search: if in TLB, get frame #"Ø Otherwise, access the page table in memory, and get frame #

and put it into TLB"ü (if TLB is full, replace an old entry. Wired down entries will not be removed)"

Page #" Frame #"

Target Page #" If found" Target Frame #"

7

Translation Look-ahead Buffers (TLB)

❚  Some TLBs store address-space identifiers (ASIDs) in each TLB entry"

❚  ASID: such as PID à uniquely identifies each process to provide address-space protection for that process"

❚  Otherwise, if there is no ASID in TLB, when a different process runs, flush TLB à context switch cost"

Paging Hardware With TLB

Memory Management


40

Virtual and Physical Addresses

❚  Virtual address space Ø Determined by instruction width Ø Same for all processes

❚  Physical memory indexed by physical addresses Ø  Limited by bus size (# of bits) Ø Amount of available memory

CPU chipCPU

Memory

Disk�controller

MMU

Virtual addresses �from CPU to MMU

Physical addresses �on bus, in memory


Paging: a memory-management scheme that permits address space of process to be non-continuous.

Paging: Basic Ideas

Page Table

0–4K

4–8K

8–12K

12–16K

16–20K

20–24K

24–28K

28–32K

1

3

0

2

Physical memory

2

0–4K

4–8K

8–12K

12–16K

0

3

Logical memory

1

A page is mapped to a frame❚  Divide physical memory into fixed-sized blocks called frames!Ø  Size is power of 2, between 512 bytes and 16MB or more"

❚  Divide logical memory into blocks of same size called pages"

❚  Set up a page table to translate logical to physical addresses"Ø User sees memory a contiguous

space (0 to MAX) but OS does not need to allocate it this way" 0 5

1 1

2 7

3 3

.

Address Translation n  Suppose the logical address space is 2m and page size is 2n ""so the number of pages is 2m / 2 n , which is 2m-n "

❚  Logical Address (m bits) is divided into:"Ø Page number (p) – used as an index into a page table which

contains base address of each page in physical memory"Ø Page offset (d) – combined with base address to define the

physical memory address that is sent to the memory unit"

page number" page offset"

p" d"

m - n! n!

8

Page Table Entry: Protection & Other Bits

❚  Page Table Entry (PTE): Information Needed"Ø Valid-invalid bit attached for memory protection"

ü “valid” indicates that the associated page is a legal page that is in memory!ü “invalid” indicates that the page is not in memory (or not legal for protection)"

Ø Referenced bit: set if the page has been accessed recently"Ø Dirty (modified) bit: set if data on the page has been

modified"Ø Protection information: read-only/writable/executable or not"

❚  Size of each PTE: frame number plus several bits"Ø Number of bits: power of 2 à if needs 24 bits, use 32b (4B)"

Frame numberVRDProtection

Valid bit Referenced bit Dirty bit

Size of A Page/Frame: An Example ❚  Example: "

Ø Logical (Virtual) address: 22 bits à 4 MB"Ø Physical address (bus width): 20 à 1MB"

Page/frame size: 1KB, requiring 10 bits " P#:12 bits Offset: 10 bits

F#:10 bits Offset: 10 bits

Virtual addr:

Physical addr:

What are the problems with

this option?

.

Ø Number of pages: 12 bits à 4K pages Ø Number of frames: 10 bits à 1K frames Ø Size of page table:

§  Page table entry must have at least 10 bits (1K) à 2 bytes §  4K * 2 bytes = 8KB à requires 8 consecutive frames

45

Another Example

❚  Example: Ø  64 KB virtual memory Ø  32 KB physical memory Ø  4 KB page/frame size à 12 bits as offset (d)

Page #:4bits Offset: 12 bits

Virtual address: 16 bits

Frame #:3bits Offset: 12 bits

physical address: 15 bits

How many virtual pages?

How many physical frames?

Address �Translation

Department of Computer Science @ UTSA 46

More on Page Table

❚  Different processes have different page tables Ø CR3 points to the page table Ø Change CR3 registers when context switches

❚  Page table resides in main (physical) memory Ø Continuous memory segment


Why?

47

page number

p d

page offset

01

p-1p f

f d

frame number

.�.�.

page table

physical memory

01

.�.�.f-1f

f+1f+2

.�.�.

CPU

Address Translation Architecture

Virtual address

physical address

.�.�.

How big the page table is? 48

Page Table Size for 32bit System

❚  Modern Systems/Applications Ø  32 bits virtual address Ø System with 1GB physical memory à 30 bits physical address Ø Suppose the size of one page/frame is 4KB (12 bits)

❚  Page table size Ø  # of virtual pages: 32 – 12 = 20 bits à 220 PTEs Ø Page table size = PTE size * 220 = 4 MB per process à 210 frames

❚  If there are 128 processes Ø Page tables occupy 128 * 4MB = 512 MB Ø  50% of memory will be used by page tables?

How can we get smaller page table?!Department of Computer Science @ UTSA

9

49

Two-Level Page Tables of Linux

❚  Solution: multi-level page tables ❚  Virtual address: three parts

Ø  Level-one page number (10 bits) Ø  Level-two page number (10 bits) Ø Offset (12 bits)

❚  PTE in 1st level page table contains physical frame # for one 2nd level page table

❚  2nd level page table has actual physical frame numbers for the memory address

884960

955

.�.�.

220657

401

.�.�.

1st level�page table

2nd level�page tables

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

main�memory

.�.�.

125613

961

.�.�.

.�.�.

.�.�.

.�.�.


Two-Level Page Tables of Linux

❚  Why it is good? Ø We don’t have to allocate all levels

initially Ø  They don’t have to continuous

884960

955

.�.�.

220657

401

.�.�.

1st level�page table

2nd level�page tables

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

.�.�.

main�memory

.�.�.

125613

961

.�.�.

51

.�.�.

.�.�.

Example: 2-level Address Translation

p1 = 10 bits p2 = 10 bits offset = 12 bits

page offsetpage number

.�.�.

0

1

p1.�.�.

18

01

p2

18physical address

1st level page table 2nd level page table

main memory

01

frame�number

12Page table base

.�.�.

.�.�.

Which tables should �be in memory?Department of Computer Science @ UTSA 52

Memory Requirement of Page Tables

❚  Only the 1st level page table and the required 2nd level page tables need to be in memory

❚  Example: a process with working-set size of 32 MB (recall that 1GB memory and 32 bits virtual address) Ø 4KB / page à process has 8K (8*210) virtual pages Ø One 2nd level page table maps 210 pages; Ø Number (minimum) of 2nd level page table needed:

8 = 8*2^10/2^10 = 8

Ø Total (minimum) memory for page table: 1st level page table + 8; in total of 9 page tables à 9 X 4KB = 36 KB


53

Page table size

❚  32bit machine, page size 4k, each entry 4 bytes, one level page table (full 4GB linear address)

Page table size = 2^20 pages = 2^22 = 4M

❚  32bit machine, page size 4k, each entry 4 bytes, two level page table (two pages:0x0000000, and 0xFFFFF000)

Page table size = (2^10 level-0 entries) *4bytes + (2^10 level-1 entries * 4 bytes) * 2 = 12 Kbytes

Quiz


❚  Why the page table has to be physically continuous?

❚  If we have two pages (0x00000000 and 0x0020100)), what is the size of page table?

10

Linux’s 3 level page table

55

Linear Address converted to physical address using 3 levels

Index intoPage Dir.

Index intoPage MiddleDir.

Index intoPage Table

PageOffset

What is the benefit to use 3-level page table?What is the shortcoming?

56

Size of Page/Frame: How Big?

❚  Determined by number of bits in offset (12Bità4KB) ❚  Smaller pages have advantages

Ø Less internal fragmentation Ø Better fit for various data structures, code sections

❚  Larger pages are better because Ø Less overhead to keep track of them Ø More efficient to transfer larger pages to and from disk

❚  One principle: page table à fit into one frame

How can we make the address translation faster?Department of Computer Science @ UTSA

32bits machine, 10 bits for each level

Integrating VM and Cache

❚  Most caches “physically addressed”"Ø Accessed by physical addresses"Ø Allows multiple processes to have blocks in cache at same

time else context switch == cache flush"Ø Allows multiple processes to share pages"Ø Cache doesn’t need to be concerned with protection issues"

ü Access rights checked as part of address translation"

❚  Perform address translation before cache lookup"Ø Could involve memory access itself (to get PTE)"Ø So page table entries can also be cached"

CPUTrans-lation Cache Main

Memory

VA PA miss

hitdata

58

Logical�page #

Physical�frame #

Example TLB

8 unused

2 3

12 29 22 7

3

1 0 12 6 11 4

Translation Lookaside Buffer (TLB)

❚  Small Hardware: fast ❚  Store recent accessed mapping of

page à frame (64 ~ 1024 entries) ❚  If desired logical page number is

found, get frame number from TLB ❚  If not,

Ø Get frame number from page table in memory

Ø Use standard cache techniques Ø Replace an entry in the TLB with the

logical & physical page numbers from this reference

Ø Contains complete page table entries for small number of pages"


59

page number

p d

page offset

01

pP+1

n

f

f d

frame number

.�.�.

page table (in mem)physical memory

01

.�.�.f-1f

f+1f+2

.�.�.

frame number

CPU

Address Translation with TLB

Virtual address

p f physical address

Page fault: valid bit

Base regin PCB TLB/hardware

What happens when cpu performs a context switch?

Integrating TLB and Cache

❚  “Translation Lookaside Buffer” (TLB)"

CPUTLBLookup Cache Main

Memory

VA PA miss

hit

data

Trans-lation

hit

miss

Understanding the workflow!!!

11

61

Memory Accesses Time

❚  Assuming: Ø TLB lookup time = a Ø Memory access time = m

❚  Hit ratio (h) is percentage of time that a logical page number is found in the TLB Ø More TLB entries usually means higher h

❚  Effective Access Time (EAT) is calculated (don’t include cache effect) Ø EAT = (m + a)h + (m + m + a)(1-h) = a + (2-h)m

❚  Interpretation Ø Reference always requires TLB lookup, 1 memory access Ø TLB misses also require an additional memory reference Department of Computer Science @ UTSA

Assignment 3

❚  Understanding the page table and physical memory management Ø Single process (no context switch) Ø Everything is simulated


Assignment 3

❚  Part1: Ø Given the page table for a simple process Ø No page replacement

❚  Part2: Ø Physical page replacement Ø Create the mapping between virtual addresses and

physical addresses Ø Cleanup the page table when one page is evicted

❚  Part3: Ø Compute the number of entries of page table, page/frame

size, the number of physical pages


Least recently used algorithm (LRU)

❚  A refinement of NRU that orders how recently a page was used Ø Keep track of when a page is used Ø Replace the page that has been used least recently

Borrow the slides from Jonathan Walpole at Portland State

LRU page replacement

❚  Replace the page that hasn’t been referenced in the longest time

Time 0 1 2 3 4 5 6 7 8 9 10 Requests c a d b e b a b c d Page 0 a Frames 1 b 2 c 3 d Page faults

a a a a a a a a a a b b b b b b b b b b c c c c e e e e e d d d d d d d d d c c X X X

Least recently used algorithm (LRU) ❚  But how can we implement this?

12

Least recently used algorithm (LRU)

❚  But how can we implement this?

❚  Implementation #1: Ø Keep a linked list of all pages Ø On every memory reference,

ü Move that page to the front of the list Ø  The page at the tail of the list is replaced

memory management cs 3723 operating systems: memory ...€¦ · 1 1 cs 3723 operating systems:...

Documents