live updating operating systems using virtualization
DESCRIPTION
Live Updating Operating Systems Using Virtualization. Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang Fudan University Pen-Chung Yew University of Minnesota at Twin-Cities. Motivation. Operating Systems are far from perfect: Security holes, design flaws, bugs, new features …… - PowerPoint PPT PresentationTRANSCRIPT
Live Updating Operating Systems Using Virtualization
Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu ZangFudan University
Pen-Chung YewUniversity of Minnesota at Twin-Cities
2
Motivation Operating Systems are far from perfect:
Security holes, design flaws, bugs, new features ……
Results: continuous patches and upgrades required
Difficulties in applying patches and upgrades Disruptive: loss of availability Irreversible: risk of system crash
Live Update feature is highly desirable, and very often, critical.
3
What COS misses?
Requirements to Live Update an OS: Define an updatable unit
Difficult, COS is monolithic Apply patch in a safe point
Some hot spots do not have a safe point root file system, network modules
Consistency Difficult for OS to update itself
4
What is LUCOS? ”Any problem in computer science can be solved
with another level of indirection.” David Wheeler in Butler Lampson’s 1992 ACM Turing
Award speech. Live Updating Contemporary Operating Systems
using virtualization Use Virtual Machine Monitors (VMMs) to patch
operating systems (e.g. Linux) Avoid need for safe point, allow co-existence of the old
version and the new version of data structures. VMM maintains the coherence and tracks when to finish
a live update.
5
What is LUCOS?
A practical live updating system Apply a broaden range of real-life Linux patches
on-the-fly require no safe points, retain OS-transparency. Support patches for recovering tainted state (e.g.
deadlock situation) Allow rolling back committed patches Require minimal update time(< 1ms) and incur
negligible performance overhead (less than 1%)
6
Some Existing Efforts Dynamic Software Update
Focus on live update to application software LUCOS: live update to operating systems
K42 (Baumann et al., Usenix ‘05) A new operating system to support live update Tightly bound to object-oriented design techniques A safe point is desirable LUCOS: transparently supports existing OS
(including non-object-oriented), requires no safe point
7
LUCOS Architecture
8
Two Types of Live Updates
Updates to only code: Only code is modified.
Updates to code with data changes: Including global, single-instance data, or
multiple-instance data.
9
Live Update to Code Only
Linux
orig_func_vaddr
jmp patch_func_vaddr
(1) write(2) jump
originalfunctioncode
patchfunctioncode
patch_func_vaddr
Live update to code only: (1) Update Server replaces the head of the original function with a jump instruction to the patch function address (2) OS executes the jump instruction in the original function, and jump to the patch function.
10
Live Update to Code with Data Changes
Linux Xen
orig_func_vaddr
jmp patch_func_vaddr(1) write
(2) jump
originalfunctioncode
patchfunctioncode
patch_func_vaddr
statetransferfunction
(3) interrupt
state_transfer_func_vaddr
Updating instances ofthe data structures
Live update to code with data changes: (1)Update Server replaces the beginning of the original function with a jump instruction to the patch function address and write-protects the related instances of the data structures. (2) OS executes the jump instruction in the original function, and jump to the patch function. (3) OS triggers an interrupt when instances of the data structures are updated, and the Update Server executes the state transfer function.
11
Termination of a Live Update
When all threads leave original functions Stack inspection (Altekar, Usenix Security’05):
Maintain a list of threads executing in original functions
Remove threads that leave original functions Terminate live update when the list is empty
12
Patches for Recovering Tainted State Vision:
Some bugs could cause a tainted state: Deadlock situation
Simple patching could not solve the problemspinlock_t demo_lock = SPIN_LOCK_UNLOCKED;void foo(void){...;
spin_lock(&demo_lock);... ;if(condition){return;}...;spin_unlock(&demo_lock);
}
Code 1. a buggy function witha potential for deadlocks.
spinlock_t demo_lock = SPIN_LOCK_UNLOCKED;void foo_patch(void){...;
spin_lock(&demo_lock);...;if(condition){spin_unlock(&demo_lock);return;}...;spin_unlock(&demo_lock);
}code 2: a patch function to fix the deadlock problem.
void state_transfer(void){if(spin_is_locked(&demo_lock))spin_unlock(&demo_lock);}
code 3: a callback function to recover from a deadlocked situation.
13
Patches for Recovering Tainted State Solutions:
Allow callbacks in live update Three types of callbacks in LUCOS:
function callbacks thread callbacks data callbacks
Example: use thread callbacks to resolve the deadlock situation
14
Patch Rollback
A special type of patches: Use the original code and data to patch the
committed ones Change state with new data back to original
data
Resource overhead: Has to keep original code and data in memory
15
Experiments Setup
Implemented on Linux 2.6.10 running Xen-2.0.5.
Systems: Fedora Core 2 distribution 3.0GHz Pentium IV with 1GB RAM Intel Pro 100/1000 Ethernet NIC in 100Mbs LAN A single 250GB 7200 RPM SATA disk.
16
Workloads SPEC INT 2000:
Measure the performance of CPU-intensive workloads
Linux build time: Measure the overall time to built a Linux
Kernel 2.6.10 with gcc-3.3.3. Open Source Database Benchmark suite
(OSDB): Information Retrieval (IR) Online Transaction Processing (OLTP)
17
Experience with Real-Life Patches
Five typical patches selected from Linux upgrades: upgrade of Linux kernel from 2.6.10 to 2.6.11 upgrade of backend block device drivers in Xen-Linux
No. Patch type Description
1 Type 1 Fixing the page reading bug
2 Type 1 Removal of livelock avoidance
3 Type 2 Upgrading the process scheduler
4 Type 2 Reconstruction of the IRQ descriptors
5 Type 2 Upgrading backend block device drivers in Xen-Linux
18
Time to Apply and Rollback Live Updates
Note: OSDB-IR/OLTP are running in background when the patches are applied and rollbacked.
19
Relative Performance (Normal Execution)
20
Conclusions
Existing operating systems can be live updated No safe point is required
Patches should recover tainted state Rollback of a live update is supported Time overhead to apply a live update is
minimal Performance overhead is negligible
21
Future Work Avoid the performance overhead of
virtualization Integrate it with our self-virtualization system Virtualize operating systems on demand
22
Questions?
Our contact information: Parallel processing institute, Fudan University,
China Phone: +86-21-51355363 Fax: +86-21-65646571
23
24
Patch File Format in LUCOS
Follows the format of Linux kernel modules, and adds New declarations of data structures *Callback functions *Patch startup and patch cleanup functions *State transfer
25
Fine-grained memory protection Facilitating ECC memory (Qin et al.,
HPCA’05) cache line granularity
Mondrian memory protection (Witchel et al., ASPLOS-X) word level memory protection
26
Self-virtualization: architecture• OS can switch between the three modes on-the-fly quickly
• Applications are completely unaware of the mode switch
• Hosting mode is used to host other OS .
• Migrating mode prepares the OS to self-migrate to other machine.