memory management cs 3723 operating systems: memory ...€¦ · 1 1 cs 3723 operating systems:...
TRANSCRIPT
1
1
CS 3723 Operating Systems: Memory Management (SGG-08)
Department of Computer Science @ UTSA
Instructor: Dr. Tongping Liu
Thank Dr. Dakai Zhu and Dr. Palden Lama for providing their slides.
Memory Management
❚ Background"❚ Swapping "❚ Contiguous Memory Allocation and Fragmentation"❚ Paging"❚ Structure of the Page Table"❚ TLB"
Objectives
❚ To provide a detailed description of various ways of organizing memory hardware"
❚ To discuss various memory-management techniques, including paging and segmentation"
❚ To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging"
n CPU can directly access main memory and registers only
n But, programs and data are stored in disks. So, program and data must be brought (from disk) into memory
n Memory accesses can be bottleneck l Cache between memory and CPU registers
n Memory Hierarchy"l Cache: small, fast, expensive; SRAM"l Main memory: medium-speed; DRAM"l Disk: slow, cheap, non-volatile storage""
Internal Memory
I/O Devices (disks)
CPU (Processor/ALU)
Background
Background
❚ Think memory as an array of words containing program instructions and data"
❚ How do we execute a program?"Ø Fetch an instruction à decode à may fetch operands à execute à may store results!
❚ Memory unit sees a stream of ADDRESSES"
❚ How to manage and protect main memory while sharing it among multiple processes?"Ø Keeping multiple process in memory is essential to
improving the CPU utilization"
Simple one: Base and Limit Registers
❚ Memory protection \is required to ensure correct operation"❚ A pair of base and limit registers define the logical address
space of a process. Every memory access is checked by hardware. Any problem? "Too slow
2
How to Bind Instructions and Data to Memory
"❚ Address binding of instructions and data
to memory addresses can happen at three different stages"Ø Compile time: If memory location is known
beforehand, absolute code can be generated; must recompile code if starting location changes (e.g. DOS .com programs)"
Ø Load time: Compiler generates relocatable code, final binding occurred at load time "
Ø Execution time: Binding delayed until execution time if the process can be moved during its execution from one memory segment to another!
Logical vs. Physical Address Space
❚ Logical address "Ø generated by the CPU; also referred to as virtual address!
❚ Physical address "Ø address seen by the memory unit"
❚ Logical and physical addresses are the same in compile-time and load-time address-binding schemes; "
❚ Logical (virtual) and physical addresses differ in execution-time address-binding scheme !
Ø The mapping form logical address to physical address is done by a hardware called a memory management unit (MMU)."
Ø We will mainly study how this mapping is done and what hardware support is needed"
Memory-Management Unit (MMU) ❚ Hardware device that maps logical
(virtual) address to physical address"❚ In a simple MMU, the value in the
relocation register (base) is added to every address generated by a user process at the time it is sent to memory"
❚ The user program deals" with logical addresses, " not real physical addresses"
Dynamic relocation using a relocation register
Memory-Management Unit (MMU) ❚ logical addresses (in the range
0~max) and physical addresses (in the range R+0 to R+max for a base value R).
❚ The user program generates only logical addresses and thinks that the process runs in locations 0 to max.
❚ logical addresses must be mapped to physical addresses before they are used"
Dynamic relocation using a relocation register
Dynamic Loading
❚ Why dynamic loading?"Ø Without this, the size of a process is limited to that of
physical memory"❚ Dynamic loading:"
Ø Dynamically load routines when they are called"Ø All other routines are kept on disk in a loadable format"
❚ Better memory-space utilization since unused routines are never loaded"
❚ Useful when large amounts of code are needed to handle infrequently occurring cases like error handling "
Dynamic Linking and Shared Libraries
❚ Static linking"Ø System language and library routines are included in the binary code"
❚ Dynamic linking: similar to dynamic loading"Ø Linking postponed until execution time"Ø Without that, every library will have a copy in the executable
file, wasting both disk space and memory"Ø A stub is used to locate and load the appropriate memory-
resident library routine, when the routine is not existing "Ø If it is existing, no need for loading again (PLT)"
❚ Dynamic linking is particularly useful for libraries "!(one copy, transparent updates)!
3
Memory Management
❚ Background"❚ Swapping "❚ Contiguous Memory Allocation and Fragmentation"❚ Paging"❚ Structure of the Page Table"❚ TLB"
Swapping
Reject it! But if you want to support more processes, "
"Swap out an old process to a disk""(waiting for long I/O, quantum expired etc)"
OS OS OS OS OS OS OSA A
BABC
BC
BC
D
C
D
C
DA
What if no free region is big enough?
.
Consider a multi-programming environment:l Each program must be in the memory to be executed l Processes come into memory and l Leave memory when execution is completed
D
Swapping ❚ A process can be swapped
temporarily out of memory to a backing store, and then brought back into memory for continued execution "Ø Backing store – large enough to
accommodate copies of all memory images for all users; must provide direct access to these memory images"
Ø Roll out, roll in – swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed"
❚ Swapping can free up memory for additional processes. "
Swapping (cont’d)
❚ Major part of swap time is transfer time; "Ø Total transfer time is directly proportional to the amount of
memory swapped (e.g., 10MB process / 40MB per sec = 0.25 sec)"
Ø May take too much time to be used often "❚ Standard swapping requires too much swapping time"❚ Modified versions of swapping are found on many
systems (i.e., UNIX, Linux, and Windows), but it is often disabled !
Memory Management
❚ Background"❚ Swapping "❚ Contiguous Memory Allocation and Fragmentation"❚ Paging"❚ Structure of the Page Table"❚ TLB"
Contiguous Allocation
❚ Main memory is usually divided into two partitions:"Ø Resident operating system, usually held in low memory"
Ø User processes, usually held in high memory"n Relocation registers are used to
protect user processes from each other, and from changing operating-system code and data"l MMU maps logical address
to physical addresses dynamically!
l But the physical addresses should be contiguous !
4
Contiguous Allocation (Cont)
❚ Multiple-partition allocation"Ø Hole – block of available memory; "
ü holes of various size are scattered throughout memory"
Ø When a process arrives, OS allocates memory from a hole large enough to accommodate it"
Ø Operating system maintains information about:a) allocated partitions b) free partitions (holes)"
"OS"
process 5"
process 8"
process 2"
OS"
process 5"
process 2"
OS"
process 5"
process 2"
OS"
process 5"process 9"
process 2"
process 9"
process 10"
OS"
process 9"
process 2"
process 10"
.
Dynamic Storage-Allocation Problem
❚ First-fit: Allocate the first hole that is big enough"❚ Best-fit: Allocate the smallest hole that is big enough; "
Ø Must search entire list, unless ordered by size "Ø Produces the smallest leftover hole"
❚ Worst-fit: Allocate the largest hole; "Ø Must also search entire list "Ø Produces the largest leftover hole""
❚ First-fit and best-fit are better than worst-fit in terms of speed and storage utilization. But all suffer from fragmentation "
How to satisfy a request of size n from a list of free holes"
Fragmentation ❚ External Fragmentation "
Ø total memory space exists to satisfy a request, but it is not contiguous"
❚ Internal Fragmentation "Ø allocated memory may be slightly larger than
requested memory; "Ø this size difference is internal fragmentation"
❚ How can we reduce external fragmentation "Ø Compaction: Move memory to place all free
memory together in one large block, possible only if relocation is dynamic and is done at execution time"
Ø I/O problem à large overhead!
How about not requiring programs to be loaded contiguously?
OS"
process 9"
process 2"
process 10"
Memory Management
❚ Background"❚ Swapping "❚ Contiguous Memory Allocation and Fragmentation"❚ Paging"❚ Structure of the Page Table"❚ TLB"
Paging: Basic Ideas
Page Table
0–4K
4–8K
8–12K
12–16K
16–20K
20–24K
24–28K
28–32K
1
3
0
2
Physical memory
2
0–4K
4–8K
8–12K
12–16K
0
3
Logical memory
1
A page is mapped to a frame❚ Divide physical memory into fixed-sized blocks called frames!Ø Size is power of 2, between 512 bytes and 16MB or more"
❚ Divide logical memory into blocks of same size called pages"
❚ To run a program with size n pages, we need n free frames "
❚ Set up a page table to translate logical to physical addresses"Ø User sees memory a contiguous
space (0 to MAX) but OS does not need to allocate it this way"
0 5
1 1
2 7
3 3
.
Paging: Internal Fragmentation ❚ Calculating internal fragmentation"
Ø Page size = 2,048 bytes"Ø Process size = 72,766 bytes"Ø 35 pages + 1,086 bytes"Ø Internal fragmentation of 2,048 - 1,086 = 962 bytes"Ø Worst case fragmentation = 1 frame – 1 byte"Ø On average fragmentation = 1 / 2 frame size"
❚ So small frame sizes desirable? à more entries"Ø Each page table takes memory to track"
❚ Page sizes growing over time"Ø Solaris supports two page sizes – 8 KB and 4 MB"
5
Address Translation n Suppose the logical address space is 2m and page size is 2n ""so the number of pages is 2m / 2 n , which is 2m-n "
❚ Logical Address (m bits) is divided into:"Ø Page number (p) – used as an index into a page table which
contains base address of each page in physical memory"Ø Page offset (d) – combined with base address to define the
physical memory address that is sent to the memory unit"
page number" page offset"
p" d"
m - n! n!
Paging Hardware
Paging Example Suppose the page size is 4-byte pages."
What would you say about the size of logical address space, the size of physical address space, and the size of the page table?"
Shared Pages
❚ Shared code!Ø One copy of read-only (reentrant) code
shared among processes (i.e., text editors, compilers, window systems)."
Ø Shared code must appear in same location in the logical address space of all processes"
❚ Private code and data "Ø Each process keeps a separate copy of the
code and data"Ø The pages for the private code and data can
appear anywhere in the logical address space"
Shared Pages Example Free Frames
Before allocation" After allocation"
6
Valid (v) or Invalid (i) Bit In A Page Table Page Table Entry: Protection & Other Bits
❚ Page Table Entry (PTE): Information Needed"Ø Valid-invalid bit attached for memory protection"
ü “valid” indicates that the associated page is a legal page that is in memory!ü “invalid” indicates that the page is not in memory (or not legal for protection)"
Ø Referenced bit: set if the page has been accessed recently"Ø Dirty (modified) bit: set if data on the page has been
modified"Ø Protection information: read-only/writable/executable or not"
❚ Size of each PTE: frame number plus several bits"Ø Number of bits: power of 2 à if needs 24 bits, use 32b (4B)"
Frame numberVRDProtection
Valid bit Referenced bit Dirty bit
Size of A Page/Frame: An Example ❚ Example: "
Ø Logical (Virtual) address: 22 bits à 4 MB"Ø Physical address (bus width): 20 à 1MB"
Page/frame size: 1KB, requiring 10 bits " P#:12 bits Offset: 10 bits
F#:10 bits Offset: 10 bits
Virtual addr:
Physical addr:
What are the problems with
this option?
.
Ø Number of pages: 12 bits à 4K pages Ø Number of frames: 10 bits à 1K frames Ø Size of page table:
§ Each page entry must have at least 10 bits à 2 bytes § 4K * 2 bytes = 8KB à requires 8 consecutive frames
Size of Page/Frame: How Big?
❚ Determined by number of bits in offset (512Bà16KB and become larger)!
❚ Smaller pages "Ø + Less internal fragmentation"Ø - Too large page table: spin over more than one frame (need
to be continuous due to index), hard to allocate!"❚ Larger pages "
Ø Too small page table so less overhead to keep track of them"Ø More efficient to transfer larger pages to/from disk "Ø - More internal fragmentation; waste of memory"
❚ Design principle: fit page table into one frame!Ø If not, multi-level paging (discussed later)"
.
Implementation of Page Table
❚ Where should we store Page table? "Ø Registers (fast efficient but limited), main memory, dedicated lookup
tables"❚ Memory"
Ø Page-table base register (PTBR) points to the page table in memory"
Ø Page-table length register (PTLR) indicates size of the page table"
❚ Every data/instruction access requires two memory accesses"Ø One for the page table and one for the data/instruction"
Translation Look-ahead Buffers (TLB)
❚ Translation Look-ahead buffers (TLB)"Ø A special fast-lookup hardware cache: associative memory!Ø Access by content à parallel search: if in TLB, get frame #"Ø Otherwise, access the page table in memory, and get frame #
and put it into TLB"ü (if TLB is full, replace an old entry. Wired down entries will not be removed)"
Page #" Frame #"
Target Page #" If found" Target Frame #"
7
Translation Look-ahead Buffers (TLB)
❚ Some TLBs store address-space identifiers (ASIDs) in each TLB entry"
❚ ASID: such as PID à uniquely identifies each process to provide address-space protection for that process"
❚ Otherwise, if there is no ASID in TLB, when a different process runs, flush TLB à context switch cost"
Paging Hardware With TLB
Memory Management
❚ Background"❚ Swapping "❚ Contiguous Memory Allocation and Fragmentation"❚ Paging"❚ Structure of the Page Table"❚ TLB"
40
Virtual and Physical Addresses
❚ Virtual address space Ø Determined by instruction width Ø Same for all processes
❚ Physical memory indexed by physical addresses Ø Limited by bus size (# of bits) Ø Amount of available memory
CPU chipCPU
Memory
Disk�controller
MMU
Virtual addresses �from CPU to MMU
Physical addresses �on bus, in memory
Department of Computer Science @ UTSA
Paging: a memory-management scheme that permits address space of process to be non-continuous.
Paging: Basic Ideas
Page Table
0–4K
4–8K
8–12K
12–16K
16–20K
20–24K
24–28K
28–32K
1
3
0
2
Physical memory
2
0–4K
4–8K
8–12K
12–16K
0
3
Logical memory
1
A page is mapped to a frame❚ Divide physical memory into fixed-sized blocks called frames!Ø Size is power of 2, between 512 bytes and 16MB or more"
❚ Divide logical memory into blocks of same size called pages"
❚ Set up a page table to translate logical to physical addresses"Ø User sees memory a contiguous
space (0 to MAX) but OS does not need to allocate it this way" 0 5
1 1
2 7
3 3
.
Address Translation n Suppose the logical address space is 2m and page size is 2n ""so the number of pages is 2m / 2 n , which is 2m-n "
❚ Logical Address (m bits) is divided into:"Ø Page number (p) – used as an index into a page table which
contains base address of each page in physical memory"Ø Page offset (d) – combined with base address to define the
physical memory address that is sent to the memory unit"
page number" page offset"
p" d"
m - n! n!
8
Page Table Entry: Protection & Other Bits
❚ Page Table Entry (PTE): Information Needed"Ø Valid-invalid bit attached for memory protection"
ü “valid” indicates that the associated page is a legal page that is in memory!ü “invalid” indicates that the page is not in memory (or not legal for protection)"
Ø Referenced bit: set if the page has been accessed recently"Ø Dirty (modified) bit: set if data on the page has been
modified"Ø Protection information: read-only/writable/executable or not"
❚ Size of each PTE: frame number plus several bits"Ø Number of bits: power of 2 à if needs 24 bits, use 32b (4B)"
Frame numberVRDProtection
Valid bit Referenced bit Dirty bit
Size of A Page/Frame: An Example ❚ Example: "
Ø Logical (Virtual) address: 22 bits à 4 MB"Ø Physical address (bus width): 20 à 1MB"
Page/frame size: 1KB, requiring 10 bits " P#:12 bits Offset: 10 bits
F#:10 bits Offset: 10 bits
Virtual addr:
Physical addr:
What are the problems with
this option?
.
Ø Number of pages: 12 bits à 4K pages Ø Number of frames: 10 bits à 1K frames Ø Size of page table:
§ Page table entry must have at least 10 bits (1K) à 2 bytes § 4K * 2 bytes = 8KB à requires 8 consecutive frames
45
Another Example
❚ Example: Ø 64 KB virtual memory Ø 32 KB physical memory Ø 4 KB page/frame size à 12 bits as offset (d)
Page #:4bits Offset: 12 bits
Virtual address: 16 bits
Frame #:3bits Offset: 12 bits
physical address: 15 bits
How many virtual pages?
How many physical frames?
Address �Translation
Department of Computer Science @ UTSA 46
More on Page Table
❚ Different processes have different page tables Ø CR3 points to the page table Ø Change CR3 registers when context switches
❚ Page table resides in main (physical) memory Ø Continuous memory segment
Department of Computer Science @ UTSA
Why?
47
page number
p d
page offset
01
p-1p f
f d
frame number
.�.�.
page table
physical memory
01
.�.�.f-1f
f+1f+2
.�.�.
CPU
Address Translation Architecture
Virtual address
physical address
.�.�.
How big the page table is? 48
Page Table Size for 32bit System
❚ Modern Systems/Applications Ø 32 bits virtual address Ø System with 1GB physical memory à 30 bits physical address Ø Suppose the size of one page/frame is 4KB (12 bits)
❚ Page table size Ø # of virtual pages: 32 – 12 = 20 bits à 220 PTEs Ø Page table size = PTE size * 220 = 4 MB per process à 210 frames
❚ If there are 128 processes Ø Page tables occupy 128 * 4MB = 512 MB Ø 50% of memory will be used by page tables?
How can we get smaller page table?!Department of Computer Science @ UTSA
9
49
Two-Level Page Tables of Linux
❚ Solution: multi-level page tables ❚ Virtual address: three parts
Ø Level-one page number (10 bits) Ø Level-two page number (10 bits) Ø Offset (12 bits)
❚ PTE in 1st level page table contains physical frame # for one 2nd level page table
❚ 2nd level page table has actual physical frame numbers for the memory address
884960
955
.�.�.
220657
401
.�.�.
1st level�page table
2nd level�page tables
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
main�memory
.�.�.
125613
961
.�.�.
.�.�.
.�.�.
.�.�.
Department of Computer Science @ UTSA
Two-Level Page Tables of Linux
❚ Why it is good? Ø We don’t have to allocate all levels
initially Ø They don’t have to continuous
884960
955
.�.�.
220657
401
.�.�.
1st level�page table
2nd level�page tables
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
.�.�.
main�memory
.�.�.
125613
961
.�.�.
51
.�.�.
.�.�.
Example: 2-level Address Translation
p1 = 10 bits p2 = 10 bits offset = 12 bits
page offsetpage number
.�.�.
0
1
p1.�.�.
18
01
p2
18physical address
1st level page table 2nd level page table
main memory
01
frame�number
12Page table base
.�.�.
.�.�.
Which tables should �be in memory?Department of Computer Science @ UTSA 52
Memory Requirement of Page Tables
❚ Only the 1st level page table and the required 2nd level page tables need to be in memory
❚ Example: a process with working-set size of 32 MB (recall that 1GB memory and 32 bits virtual address) Ø 4KB / page à process has 8K (8*210) virtual pages Ø One 2nd level page table maps 210 pages; Ø Number (minimum) of 2nd level page table needed:
8 = 8*2^10/2^10 = 8
Ø Total (minimum) memory for page table: 1st level page table + 8; in total of 9 page tables à 9 X 4KB = 36 KB
Department of Computer Science @ UTSA
53
Page table size
❚ 32bit machine, page size 4k, each entry 4 bytes, one level page table (full 4GB linear address)
Page table size = 2^20 pages = 2^22 = 4M
❚ 32bit machine, page size 4k, each entry 4 bytes, two level page table (two pages:0x0000000, and 0xFFFFF000)
Page table size = (2^10 level-0 entries) *4bytes + (2^10 level-1 entries * 4 bytes) * 2 = 12 Kbytes
Quiz
Department of Computer Science @ UTSA 54
❚ Why the page table has to be physically continuous?
❚ If we have two pages (0x00000000 and 0x0020100)), what is the size of page table?
10
Linux’s 3 level page table
55
Linear Address converted to physical address using 3 levels
Index intoPage Dir.
Index intoPage MiddleDir.
Index intoPage Table
PageOffset
What is the benefit to use 3-level page table?What is the shortcoming?
56
Size of Page/Frame: How Big?
❚ Determined by number of bits in offset (12Bità4KB) ❚ Smaller pages have advantages
Ø Less internal fragmentation Ø Better fit for various data structures, code sections
❚ Larger pages are better because Ø Less overhead to keep track of them Ø More efficient to transfer larger pages to and from disk
❚ One principle: page table à fit into one frame
How can we make the address translation faster?Department of Computer Science @ UTSA
32bits machine, 10 bits for each level
Integrating VM and Cache
❚ Most caches “physically addressed”"Ø Accessed by physical addresses"Ø Allows multiple processes to have blocks in cache at same
time else context switch == cache flush"Ø Allows multiple processes to share pages"Ø Cache doesn’t need to be concerned with protection issues"
ü Access rights checked as part of address translation"
❚ Perform address translation before cache lookup"Ø Could involve memory access itself (to get PTE)"Ø So page table entries can also be cached"
CPUTrans-lation Cache Main
Memory
VA PA miss
hitdata
58
Logical�page #
Physical�frame #
Example TLB
8 unused
2 3
12 29 22 7
3
1 0 12 6 11 4
Translation Lookaside Buffer (TLB)
❚ Small Hardware: fast ❚ Store recent accessed mapping of
page à frame (64 ~ 1024 entries) ❚ If desired logical page number is
found, get frame number from TLB ❚ If not,
Ø Get frame number from page table in memory
Ø Use standard cache techniques Ø Replace an entry in the TLB with the
logical & physical page numbers from this reference
Ø Contains complete page table entries for small number of pages"
Department of Computer Science @ UTSA
59
page number
p d
page offset
01
pP+1
n
f
f d
frame number
.�.�.
page table (in mem)physical memory
01
.�.�.f-1f
f+1f+2
.�.�.
frame number
CPU
Address Translation with TLB
Virtual address
p f physical address
Page fault: valid bit
Base regin PCB TLB/hardware
What happens when cpu performs a context switch?
Integrating TLB and Cache
❚ “Translation Lookaside Buffer” (TLB)"
CPUTLBLookup Cache Main
Memory
VA PA miss
hit
data
Trans-lation
hit
miss
Understanding the workflow!!!
11
61
Memory Accesses Time
❚ Assuming: Ø TLB lookup time = a Ø Memory access time = m
❚ Hit ratio (h) is percentage of time that a logical page number is found in the TLB Ø More TLB entries usually means higher h
❚ Effective Access Time (EAT) is calculated (don’t include cache effect) Ø EAT = (m + a)h + (m + m + a)(1-h) = a + (2-h)m
❚ Interpretation Ø Reference always requires TLB lookup, 1 memory access Ø TLB misses also require an additional memory reference Department of Computer Science @ UTSA
Assignment 3
❚ Understanding the page table and physical memory management Ø Single process (no context switch) Ø Everything is simulated
Department of Computer Science @ UTSA 62
Assignment 3
❚ Part1: Ø Given the page table for a simple process Ø No page replacement
❚ Part2: Ø Physical page replacement Ø Create the mapping between virtual addresses and
physical addresses Ø Cleanup the page table when one page is evicted
❚ Part3: Ø Compute the number of entries of page table, page/frame
size, the number of physical pages
Department of Computer Science @ UTSA 63
Least recently used algorithm (LRU)
❚ A refinement of NRU that orders how recently a page was used Ø Keep track of when a page is used Ø Replace the page that has been used least recently
Borrow the slides from Jonathan Walpole at Portland State
LRU page replacement
❚ Replace the page that hasn’t been referenced in the longest time
Time 0 1 2 3 4 5 6 7 8 9 10 Requests c a d b e b a b c d Page 0 a Frames 1 b 2 c 3 d Page faults
a a a a a a a a a a b b b b b b b b b b c c c c e e e e e d d d d d d d d d c c X X X
Least recently used algorithm (LRU) ❚ But how can we implement this?
12
Least recently used algorithm (LRU)
❚ But how can we implement this?
❚ Implementation #1: Ø Keep a linked list of all pages Ø On every memory reference,
ü Move that page to the front of the list Ø The page at the tail of the list is replaced