week 5 virtual memory input and output operating systems cs3013 / cs502
TRANSCRIPT
WEEK 5
VIRTUAL MEMORYINPUT AND OUTPUT
Operating SystemsCS3013 / CS502
Agenda2
Virtual Memory
Input and Output
Objectives3
Identify virtual memory, page faults and overhead
Differentiate various page replacement policies
Explain advantages and/or disadvantages of different page replacement policies
Explain thrashing
Review4
Virtual/Logical Address The address space in which a process “thinks” Distinguished from Physical Memory, the address space fo
the hardware memory systemMultiple forms
Base and Limit registers Paging Segmentation
Memory Management Unit (MMU) Present in most modern processors Converts all virtual addresses to physical address Transparent to execution of programs
Review5
Translation Lookaside Buffer (TLB) Hardware associative memory for very fast page table
lookup of small set of active pages Can be managed in OS or in hardware (or both) Must be flushed on context switch
Page table organization Direct: maps virtual address physical address Hashed Inverted: maps physical address virtual address
Motivation6
Logical address space larger than physical memory 232 about 4GB in size “Virtual Memory” On disk
Abstraction for programmerExamples:
Unused libraries Error handling not used Maximum arrays
Paging Implementation7
Page 0 0
1
2
3
Page TableLogicalMemory
0
1
2
3
Page 2
Page 3
Page 1
1
0
3
0 Page 2
Page 0
v
i
v
i
0 2
3
1
Physical Memory Virtual Memory
What happens when access invalid page?
Cache?8
Translation Lookaside Buffer (TLB) and Page Table
Physical Memory and Virtual Memory
Issues When to put something in the cache What to throw out to create cache space for new items How to keep cached item and stored item in sync after
one or the other is updated Size of cache needed to be effective Size of cache items for efficiency
Virtual Memory9
When to swap in a page On demand? Or in anticipation?
What to throw out Page Replacement Policy
Keeping dirty pages in sync with disk Flushing strategy
Size of pages for efficiency One size fits all, or multiple sizes?
No Free Frames10
Page fault What if no free frames?
Terminate process (out of memory)
Swap out process (reduces degree of multiprogramming)
Replace another page with needed page
Page replacement
VM Page Replacement11
If there is an unused frame, use itIf there are no unused frames available, select
a victim (according to policy) and If it contains a dirty page
Write it to the disk Invalidate its PTE and TLB entry Load in new page from disk (or create new page) Update the PTE and TLB entry Restart the faulting instruction
What is cost of replace a page?How does the OS select the page to be
evicted?
Page Replacement12
Page 0 0
1
2
3
Page TableLogicalMemory
0
1
2
3
Page 2
Page 3
Page 1
1
0
3
0
Page B0
Page B1
Page A2
Page A0
v
i
v
i
Physical Memory0
1
Page Table
0
2
v
v
A0 A2
A3
A1
Virtual Memory
B1
Page A1
v
i
Demanding Page Performance13
Page Fault Rate (p) 0 ≤ p < 1 (no page faults to every reference)
Page Fault Overhead = read page time + fault service time + restart
process time read page time ~ 8-20 msec restart process time ~ 0.1-10-100 μsec fault service time ~ 0.1-10 μsec
Dominated by time to read page in from disk
Demand Paging Performance14
Effective Access Time (EAT) = (1 – p) * memory access time + p * page fault
overhead
Want EAT to degrade no more than, say, 10% from true memory access time i.e. EAT < (1 + 10%) * memory access time
Performance Example15
Memory access time = 100 nsPage fault overhead = 25 msPage fault rate = 1/1000
What is the Effective Access Time?
Performance Example16
Memory access time = 100 nsPage fault overhead = 25 ms
Goal: achieve less than 10% degradation
Page Replacement Algorithms17
Want lowest page-fault rateEvaluate algorithm by running it on a
particular string of memory references (reference string) and computing the number of page faults on that string
Reference string – ordered list of pages accessed as process executes
Ex. Reference String : 1 2 3 1 2 4 1 4 2 3 2
First-In-First-Out (FIFO)18
Easy to implement When swapping a page in, place its page id on end of
list Evict page at head of list
Page to be evicted has been in memory the longest time, but … Maybe it is being used, very active even
Weird phenomenon: Belady’s Anomaly Page fault rate may increase when there is more
physical memory!
FIFO is rarely used in practice
FIFO Illustrating Belady’s Anomaly19
First-In-First-Out (FIFO)20
Reference String : 1,2,3,4,1,2,5,1,2,3,4,5
3 Frames/Process
First-In-First-Out (FIFO)21
How can we reduce the number of page faults?
1
2
3
Reference String : 1,2,3,4,1,2,5,1,2,3,4,5
3 Frames/Process
4
1
2
5
3
4
9 page faults!
The Best Page to Replace22
The best page to replace is the one that will never be accessed again.
Optimal Algorithm – Belady’s Rule Lowest fault rate for any reference string Basically, replace the page that will not be used for
the longest time in the future Belady’s Rule is a yardstick We want to find close approximations
Optimal23
3
Reference String : 1,2,3,4,1,2,5,1,2,3,4,5
4 Frames/Process
Optimal24
Minimum # of page faults. Use this as benchmark
1
2
3
Reference String : 1,2,3,4,1,2,5,1,2,3,4,5
4 Frames/Process
4
6 page faults!
4 5
3
Least Recently Used (LRU)25
Counter implementation Every page has a counter; every time page is
referenced, copy clock to counter When a page needs to be changed, compare the
counters to determine which to change
Stack implementation Keep a stack of page numbers Page referenced: move to top No search needed for replacement
LRU Approximation26
3
Reference String : 1,2,3,4,1,2,5,1,2,3,4,5
4 Frames/Process
Least Recently Used27
1
2
3
Reference String : 1,2,3,4,1,2,5,1,2,3,4,5
4 Frames/Process
5
8 page faults!
4 3
3 5 4
LRU Approximation28
LRU is good but hardware support expensive
Aging Keep a counter for each PTE Periodically (clock interrupt) – check R-bit
If R = 0, increment counter (page has not been used) If R = 1, clear the counter (page has been used) Clear R = 0
Counter contains # of intervals since last access Replace page having largest counter value
Second-Chance29
Maintain FIFO page list
When a page frame is needed, check reference bit of top page in list If R = 1, then move page to end of list and clear R,
repeat If R = 0, then evict the page
If page referenced enough, never replaced
Implement with a circular queue
Second-Chance30
If all 1, degenerates to FIFO
1
2
3
1
41
31
0
1
2
3
0
40
30
0
NextVictim
(a) (b)
Not Recently Used (NRU)31
Enhanced Second-Chance Uses 2 bits, reference bit and modify bit
Periodically clear R bit from all PTE’sWhen needed, rank order pages as follows (R,
M) (0,0) neither recently used nor modified (0,1) not recently used but modified (1,0) recently used but “clean” (1,1) recently used and modified
Evict a page at random from lowest non-empty class
Thrashing32
With multiprogramming, physical memory is shared Kernel code System buffers and system information (e.g. PTs) Some for each process
Each process needs a minimum number of pages in memory to avoid thrashing
If a process does not have “enough” pages, the page-fault rate is very high Low CPU utilization OS thinks it needs increased multiprogramming Adds another process to system
Thrashing33
Degree of multiprogramming
CP
Uu
tili
zati
on
Working-Set Model34
w = working-set window = fixed # of page references total number of pages references in time T
Total = sum of size of w’s
m = number of frames
Working Set Example35
Assume T = 51 2 3 2 3 1 2 4 3 4 7 4 3 3 4 1 1 2 2 2 1
w={1,2,3} w={3,4,7} w={1,2} If T too small, will not encompass locality If T too large, will encompass several localities If T infinity, will encompass entire program
If Total > m thrashing Need to free up some physical memory E.g. , suspend a process, swap all of its pages out
Page Fault Frequency36
increasenumber of
frames
decreasenumber of
frames
upper bound
lower bound
Pag
e F
au
lt R
ate
Number of Frames
•Establish “acceptable” page-fault rate•If rate too low, process loses frame•If rate too high, process gains frame
Review of Page Replacement Algorithms
37
Algorithm Comment
Optimal Not implementable, but useful as benchmark
First-In-First-Out (FIFO) Might throw out important pages
Second Chance Big improvement over FIFO
Not Recently Used (NRU)
Very crude
Least Recently Used (LRU)
Excellent, but difficult to implement exactly
Working-Set Somewhat expensive to implement
Page Size38
Old - Page size fixed, New -choose page sizeHow do we pick the right page size? Tradeoffs:
Fragmentation Table size Minimize I/O
transfer small (.1ms), latency + seek time large (10ms) Locality
small finer resolution, but more faults ex: 200K process (1/2 used), 1 fault / 200k, 100K faults/1
byte
Historical trend towards larger page sizes CPU, memory faster proportionally than disks
Program Structure39
consider: int A[1024][1024];
for (j=0; j<1024; j++)
for (i=0; i<1024; i++)
A[i][j] = 0;
suppose: process has 1 frame 1 row per page 1024 * 1024 page faults!
Program Structure40
consider: int A[1024][1024];
for (i=0; j<1024; j++)
for (j=0; i<1024; i++)
A[i][j] = 0;
1024 page faults
Process Priorities41
Consider Low priority process faults
Bring page in
Low priority process in ready queue for a while, waiting while high priority process runs
High priority process faults Low priority page clean, not used in a while
Perfect!
Real-Time Processes42
Real-time Bounds on delay Hard real-time: systems crash lives lost
Air-traffic control, factory automation Soft real-time: application performance degradation
Audio, video
Paging adds unexpected delays Avoid it Lock bits for real-time processes
Agenda43
Objectives44
Explain different types of I/O devices
Explain how to handle various types of I/O interrupts
Give examples of types of I/O devices
Explain scheduling algorithms for hard disk head
Introduction to Input/Output45
One OS function is to control devices Significant fraction of code (80-90% of Linux)
Want all devices to be simple to use Convenient Ex: stdin/stdout, pipe, re-direct
Want to optimize access to device Efficient Devices have very different needs
Hardware Organization (Simple)46
Device
CPUmemory bus
Device
Hardware Organization (typical Pentium)
47
ISA bridg
e
IDE disk
Main Memor
y
CPU
Level 2
cache
Bridge
MonitorGraphics card
PCI bus
ISA bus
AGP Port
Kinds of I/O Device48
Character (and sub-character devices) Mouse, character terminal, joystick, keyboards
Block transfer Disk, tape, CD, DVD Network
Clocks Internal, external
Graphics GUI, games
Multimedia Audio, video
Other Sensors, controllers
Controlling an I/O Device49
A function of host CPU architecture
Special I/O Instructions Opcode to stop, start, query, etc. Separate I/O address space Kernel mode only
Memory-mapped I/O control registers Each register has a physical memory address Writing to data register is output Reading from data register is input Writing to control register causes action Can be mapped to kernel or user-level virtual memory
I/O Device Types - Character50
Access is serial.Data register:
Register or address where data is read from or written to
Very limited capacity (at most a few bytes)
Action register: When writing to register, causes a physical action Reading from register yields zero
Status register: Reading from register provides information Writing to register is no-op
I/O Device Types - Block51
Access is independentBuffer address register:
Points to area in physical memory to read or write dataOR
Addressable buffer for data E.g., network cards
Action register: When writing to register, initiates a physical action or data
transfer Reading from register yields zero
Status register: Reading from register provides information Writing to register is no-op
Direct Memory Access52
Very Old Controller reads from device OS polls controller for data
Old Controller reads from device Controller interrupts OS OS copies data to memory
DMA Controller reads from device Controller copies data to memory Controller interrupts OS
Direct Memory Access (DMA)53
Ability to control block devices to autonomously read from and/or write to main memory (usually) physical memory
Transfer address: Points to location in physical memory
Action register: Initiates a reading of control block chain to start actions
Status register: Reading from register provides information
Direct Memory Access (DMA)54
Programmed DMA55
DMA controller
First control block
controlsdisk
operationaddressCount
control infonext
operationaddressCount
control infonext
operationaddressCount
control infonext
…
physicalmemory
Programmed DMA56
DMA control register points to first control block in chain
Each EMA control block has Action & control info for a single transfer of one or
more blocks Data addresses in physical memory (optional) link to next block in chain (optional) interrupt upon completion
Each control block removed from chain upon completion
I/O subsystem may add control blocks to chain while transfers are in progress
Principles of I/O Software57
Efficiency – Do not allow I/O operations to become system bottleneck Especially slow devices
Device independence – isolate OS and application programs from device specific details and peculiarities
Uniform naming – support a way of naming devices that is scalable and consistent
Error handling – isolate the impact of device errors, retry where possible, provide uniform error codes Errors are abundant in I/O
Buffering – provide uniform methods for storing and copying data between physical memory and the devices
Uniform data transfer modes – synchronous and asynchronous, read, write, ..
Controlled device access – sharing and transfer modes Uniform driver support – specify interfaces and protocols that
drivers must adhere to
I/O Software “Stack”58
User Level Software
Device IndependentSoftware
Device Drivers
Interrupt Handlers
Hardware
I/O API & libraries
Device Dependent
Device Dependent – as short as possible
(Rest of the OS)
Three Common Ways I/O Can be Performed
59
Programmed I/O
Interrupt-Driven I/O
I/O using DMA
Programmed I/O (polling)60
Used when device and controller are relatively quick to process an I/O operation Device driver
Gains access to device Initiates I/O operation Loops testing for completion of I/O operation If there are more I/O operations, repeat
Used in following kinds of cases Service interrupt time > Device response time Device has no interrupt capability Embedded systems where CPU has nothing else to do
Programmed I/O Example61
Keyboard & mouse buttons implemented as 128-bit read-only register One bit for each key and mouse button 0 = “up”; 1 = “down”
Mouse “wheels” implemented as pair of counters One click per unit of motion in each of x and y directions
Clock interrupt every 10 msec Reads keyboard register, compares to previous copy Determines key & button transitions up or down Decodes transition stream to form character and button
sequence Reads and compares mouse counters to form motion sequence
Other Programmed I/O Examples62
Check status of device
Read from disk or boot device at boot time No OS present, hence no interrupt handlers Needed for bootstrap loading of the inner portions of
kernel
External sensors or controllers Real-time control systems
Interrupt Handling63
Interrupts occur on I/O events Operation completion Error or change of status Programmed in DMA command chain
Interrupt Stops CPU from continuing with current work Saves some context Restarts CPU with new address & stack
Set up by the interrupt vector Target is the interrupt handler
Interrupts64
Interrupt Request Lines (IRQs)65
Every device is assigned an IRQ Used when raising an interrupt Interrupt handler can identify the interrupting device
Assigning IRQs In older and simpler hardware, physically by wires
and contacts on device or bus In most modern PCs, etc., assigned dynamically at
boot time
Handling Interrupts (Linux Style)66
Terminology Interrupt context – kernel operating not on behalf of any process Process context – kernel operating on behalf of a particular process User context – process executing in user virtual memory
Interrupt Service Routine (ISR), also called Interrupt Handler The function that is invoked when an interrupt is raised Identified by IRQ Operates on Interrupt stack (as of Linux kernel 2.6)
One interrupt stack per processor; approx 4-8 kbytes Top half – does minimal, time-critical work necessary
Acknowledge interrupt, reset device, copy buffer or registers, etc. Interrupts (usually) disabled on current processor
Bottom half – the part of the ISR that can be deferred to more convenient time Completes I/O processing; does most of the work Interrupts enabled (usually) Communicates with processes Possibly in a kernel thread (or even a user thread!)
Interrupt-Driven I/O Example67
Software Time-of-Day ClockInterrupt occurs at fixed intervals
50 or 60 Hz
Service routine (top half):– Adds one tick to clock counter
Service routine (bottom half):– Checks list of soft timers Simulates interrupts (or posts to semaphores or
signals monitors) of any expired timers
Interrupt-Driven I/O Examples68
Very “slow” character-at-a-time terminals Mechanical printers (15 characters/second) Some keyboards (one character/keystroke)
Command-line completion in many Unix systems Game consoles Serial modems Many I/O devices in “old” computers
Paper tape, punched cards, etc.
Common theme CPU participates in transfer of every byte or word
Direct Memory Access (DMA)69
DMA Interrupt Handler70
Service Routine – top half (interrupts disabled) Does as little work as possible and returns
(Mostly) notices completion of one transfer, starts another (Occasionally) checks for status Setup for more processing in upper half
Service Routine – bottom half (interrupts enabled) Compiles control blocks from I/O requests Manages & pins buffers, translates to physical addresses Posts completion of transfers to requesting applications
Unpin and/or release buffers Possibly in a kernel thread
DMA Example71
Streaming TapeRequirement
Move data to/from tape device fast enough to avoid stopping tape motion
Producer-consumer model between application and bottom-half service routine Multiple actions queued up before previous action is
completed Notifies application of completed actions
Top half service routine Records completion of each action Starts next action before tape moves too far
Result:– Ability to read or write many 100’s of megabytes without
stopping tape motion
Other DMA Examples72
Disks, CD-ROM readers, DVD readersEthernet & wireless “modems”Tape and bulk storage devicesCommon themes:
Device controller has space to buffer a (big) block of data
Controller has intelligence to update physical addresses and transfer data
Controller (often) has intelligence to interpret a sequence of control blocks without CPU help
CPU does not touch data during transfer.
Error Detection and Correction73
Most data storage and network devices have hardware error detection and correction
Redundancy code added during writing Parity: detects 1-bit errors, not 2-bit errors Hamming codes
Corrects 1-bit errors, detects 2-bit errors Cyclic redundancy check (CRC)
Detects errors in string of 16- or 32-bits Reduces probability of undetected errors to very, very low
Check during reading Report error to device driver
Error recovery: one of principal responsibilities of a device driver!
Specific Device74
Hard Disk Drives (HDD) Controller often on disk Cache to speed access
Hard Disk Drives (HDD)75
Platters 3000-10,000 rpm Tracks Cylinders Sectors
Disk arms all move togetherIf multiple drives
Overlapping seeks but one read/write at a time.
Disk Arm Scheduling76
Read time: Seek time (arm to cylinder) Rotational delay (time for sector under head) Transfer time (take bits off disk)
Seek time dominatesHow does disk arm scheduling affect seek?
First-Come-First-Serve (FCFS)77
14 + 13 + 2 + 6 + 3 + 12 + 3 = 53Service requests in order that they arriveLittle can be done to optimizeWhat if many requests?
1 2 3 4 5 6 7 8 910
11
12
13
14
15
16
17
18
19
20
x x x x x x x
Tim
e
Shortest Seek First (SSF)78
1 + 2 + 6 + 9 + 3 + 2 = 23What if many requests?
1 2 3 4 5 6 7 8 910
11
12
13
14
15
16
17
18
19
20
x x x x x x x
Tim
e
Elevator (SCAN)79
1 + 2 + 6 + 3 + 2 + 17 = 31Usually, a little worse average seek time than SSF
But more fair, avoids starvationC-SCAN has less varianceNote, seek getting faster, rotational not
1 2 3 4 5 6 7 8 910
11
12
13
14
15
16
17
18
19
20
x x x x x x x
Tim
e
Redundant Array of Inexpensive Disks (RAID)
80
For speed Pull data in parallel
For fault-tolerance