univ. of tehrandistributed operating systems1 advanced operating systems university of tehran dept....
TRANSCRIPT
Univ. of Tehran Distributed Operating Systems
1
Advanced Advanced
Operating SystemsOperating Systems
University of TehranDept. of EE and Computer Engineering
By:Dr. Nasser Yazdani
Lecture 3: OS designOS design
Univ. of Tehran Distributed Operating Systems
2
How to design an OSHow to design an OS Some general guides and experiences. References
“The Computer for the 21st Century”, Mark Weiser
“Exokernel: An Operating System Architecture for Application Level Resource Management”, Dawson R., Engler M, Frans Kaashoek, et al.
“On Micro-Kernel Constructions“,
Univ. of Tehran Distributed Operating Systems
3
OutlineOutline New applications/requirements Organizing operating systems Some microkernel examples Object-oriented organizations
Spring Organization for multiprocessors
Univ. of Tehran Distributed Operating Systems
4
New visionNew vision Two important problems: location and
scale. Ubiquitous computing: tiny kernels of
functionality Virtual Reality Mobility Intelligent devices distributed computing" make networks
appear like disks, memory, or other nonnetworked devices.
Univ. of Tehran Distributed Operating Systems
5
Ubiquitous computingUbiquitous computing Transparent computing is the ultimate goal Computers should disappear into the background Computation becomes part of the environment Computing everywhere
Desktop, Laptop, Palmtop Cars, Cell phones Shoes, Clothing, Walls (paper / paint)
Connectivity everywhere Broadband Wireless
Mobile everywhere Users move around Disposable devices
Univ. of Tehran Distributed Operating Systems
6
Ubiquitous ComputingUbiquitous Computing Structure
Resource and service discovery critical User location an issue Interface discovery Disconnected operation Ad-hoc organization
Security Small devices with limited power Intermittent connectivity
Agents Sensor Networks
Univ. of Tehran Distributed Operating Systems
7
Grid ComputingGrid Computing Federated system
No single controlling authority Scheduling
Processors, bandwidth and other resources Policy is an important issue
Reliability, security, of who can use, and what one is willing to use.
Systems Globus toolkit Condor Related but not grid – CORBA, DCOM, DCE
Applications Distributed supercomputing
Univ. of Tehran Distributed Operating Systems
Peer-to-Peer ComputingPeer-to-Peer Computing Locating Cooperative elements Scalability OS support Security Policies
Univ. of Tehran Distributed Operating Systems
9
P2P File Sharing IssuesP2P File Sharing Issues Naming Data discovery Availability Security
Encryption Fault tolerance
Conflict resolution Replication
Univ. of Tehran Distributed Operating Systems
10
Other Peer to Peer Other Peer to Peer TechnologiesTechnologies
Ad-hoc networking Untrusted nodes used to relay messages Multiple routes (distributed and replicated) Extends range, reduces power, increases
aggregate bandwidth. Increases latency, management more
difficult.
Sensor networks An application of ad-hoc networking Add processing/reduction in the network
Univ. of Tehran Distributed Operating Systems
11
What is the big deal?What is the big deal? Performance Border crossings are expensive
Change in locality Copying between user and kernel
buffers Application requirements differ in terms
of resource management
Univ. of Tehran Distributed Operating Systems
12
Operating System Operating System OrganizationOrganization
What is the best way to design an operating system?
Put another way, what are the important software characteristics of an OS?
What should be in OS kernel or application or partitioning. Is there a minimal set for kernel?
Univ. of Tehran Distributed Operating Systems
13
Important OS Software Important OS Software CharacteristicsCharacteristics
Correctness and simplicity Power and completeness Performance Extensibility and portability
Flexibility Scalability
Suitability for distributed and parallel systems
Compatibility with existing systems Security and fault tolerance
Univ. of Tehran Distributed Operating Systems
Common OS Common OS OrganizationsOrganizations
Monolithic Virtual machine Layered designs Kernel designs Microkernels Object-Oriented Note that individual OS components can
be organized these ways Trade off between generality and
specialization
Univ. of Tehran Distributed Operating Systems
15
What are we shooting What are we shooting for?for?
OS should be thin (like a microkernel) providing only mechanisms not embodying policies (i.e. management)
Fine grain access to system resources while avoiding border crossings as much as possible (like DOS)
Allow flexible extensions for management of resources (like a microkernel) without sacrificing safety (like a monolithic kernel)
Univ. of Tehran Distributed Operating Systems
Monolithic OS DesignMonolithic OS Design Build OS as single combined module
Hopefully using data abstraction, compartmentalized function, etc.
OS lives in its own, single address space
Examples DOS early Unix systems most VFS file systems
Univ. of Tehran Distributed Operating Systems
Pros/Cons of Monolithic Pros/Cons of Monolithic OS OrganizationOS Organization
+ Highly adaptable (at first . . .)+ Little planning required+ Potentially good performance– Hard to extend and change– Eventually becomes extremely
complex– Eventually performance becomes
poor– Highly prone to bugs
Univ. of Tehran Distributed Operating Systems
Virtual Machine Virtual Machine OrganizationsOrganizations
A base operating system provides services in a very generic way
One or more other operating systems live on top of the base system Using the services it provides To offer different views of system to users
Examples - IBM’s VM/370, the Java interpreter
Univ. of Tehran Distributed Operating Systems
Pros/Cons of Virtual Pros/Cons of Virtual Machine OrganizationsMachine Organizations
+ Allows multiple OS personalities on a single machine
+ Good OS development environment+ Can provide good portability of
applications– Significant performance problems– Especially if more than 2 layers– Lacking in flexibility
Univ. of Tehran Distributed Operating Systems
20
Old ideaOld idea VM 370
Virtualization for binary support for legacy apps
Why resurgence today? Companies want a share of everybody’s pie
IBM zSeries “mainframes” support virtualization for server consolidation
Enables billing and performance isolation while hosting several customers
Microsoft has announced virtualization plans to allow easy upgrades and hosting Linux!
You can see the dots connecting up From extensibility (a la SPIN) to virtualization
Univ. of Tehran Distributed Operating Systems
21
Possible virtualization Possible virtualization approachesapproaches
Standard OS (such as Linux, Windows) Meta services (such as grid) for users to install
files and run processes Administration, accountability, and performance
isolation become hard Retrofit performance isolation into OSs
Linux/RK, QLinux, SILK Accounting resource usage correctly can be an
issue unless done at the lowest level (e.g. Exokernel)
Xen approach Multiplex physical resource at OS granularity
Univ. of Tehran Distributed Operating Systems
22
Full virtualizationFull virtualization Virtual hardware identical to real one
Relies on hosted OS trapping to the VMM for privileged instructions
Pros: run unmodified OS binary on top Cons:
supervisor instructions can fail silently in some hardware platforms (e.g. x86)
Solution in VMware: Dynamically rewrite portions of the hosted OS to insert traps
need for hosted OS to see real resources: real time, page coloring tricks for optimizing performance, etc…
Univ. of Tehran Distributed Operating Systems
23
Xen principlesXen principles Support for unmodified application
binaries Support for multi-application OS
Complex server configuration within a single OS instance
Paravirtualization for strong resource isolation on uncooperative hardware (x86)
Paravirtualization to enable optimizing guest OS performance and correctness
Univ. of Tehran Distributed Operating Systems
24
Xen: VM managementXen: VM management What would make VM virtualization easy
Software TLB Tagged TLB =>no TLB flush on context
switchX86 does not have either
Xen approach Guest OS responsible for allocating and
managing hardware PT Xen top 64MB of every address space.
Why?
Univ. of Tehran Distributed Operating Systems
Layered OS DesignLayered OS Design
Design tiny innermost layer of software Next layer out provides more functionality
Using services provided by inner layer Continue adding layers until all
functionality required has been provided Examples
Multics Fluke layered file systems and comm. protocols
Univ. of Tehran Distributed Operating Systems
Pros/Cons of Layered Pros/Cons of Layered OrganizationOrganization
+ More structured and extensible+ Easy model and development– Performance: Layer crossing can be
expensive– In some cases, unnecessary layers,
duplicated functionality.
Univ. of Tehran Distributed Operating Systems
Kernel OS DesignsKernel OS Designs Similar to layers, but only two OS layers
Kernel OS services Non-kernel OS services
Move certain functionality outside kernel file systems, libraries
Unlike virtual machines, kernel doesn’t stand alone
Examples - Most modern Unix systems
Univ. of Tehran Distributed Operating Systems
Pros/Cons of Kernel OS Pros/Cons of Kernel OS OrganizationOrganization
+ Many advantages of layering, without disadvantage of too many layers
+ Easier to demonstrate correctness– Not as general as layering– Offers no organizing principle for
other parts of OS, user services– Kernels tend to grow to monoliths
Univ. of Tehran Distributed Operating Systems
Object-Oriented OS Object-Oriented OS DesignDesign
Design internals of OS as set of privileged objects, using OO methods
Sometimes extended into application space
Tends to lead to client/server style of computing
Examples Mach (internally) Spring (totally)
Univ. of Tehran Distributed Operating Systems
30
Object-Oriented Object-Oriented OrganizationsOrganizations
Object-oriented organization is increasingly popular
Well suited to OS development, in some ways OSes manage important data
structures OSes are modularizable Strong interfaces are good in OSes
Univ. of Tehran Distributed Operating Systems
31
Object-Orientation and Object-Orientation and ExtensibilityExtensibility
One of the main advantages of object-oriented programming is extensibility
Operating systems increasingly need extensibility
So, again, object-oriented techniques are a good match for operating system design
Univ. of Tehran Distributed Operating Systems
32
How object-oriented How object-oriented should an OS be?should an OS be?
Many OSes have been built with object-oriented techniques E.g., Mach and Windows NT
But most of them leave object orientation at the microkernel boundary No attempt to force object orientation
on out-of-kernel modules
Univ. of Tehran Distributed Operating Systems
Pros/Cons of Object Pros/Cons of Object Oriented OS Oriented OS OrganizationOrganization+ Offers organizational model for entire system
+ Easily divides system into pieces+ Good hooks for security– Can be a limiting model– Must watch for performance
problemsNot widely used yet
Univ. of Tehran Distributed Operating Systems
Microkernel OS DesignMicrokernel OS Design Like kernels, only less number of abstractions
exported (threads, address space, communication channel)
Try to include only small set of required services in the microkernel
Moves even more out of innermost OS part Like parts of VM, IPC, paging, etc.
System services (e.g. VM manager) implemented as servers on top
High comm overhead between services implemented at user level and microkernel limits extensibility in practice
Examples - Mach, Amoeba, Plan 9, Windows NT, Chorus, Spring, etc.
Univ. of Tehran Distributed Operating Systems
Pros/Cons of Pros/Cons of Microkernel Microkernel OrganizationOrganization+ Those of kernels, plus:
+ Minimizes code for most important OS services
+ Offers model for entire system– Microkernels tend to grow into
kernels– Requires very careful initial design
choices– Serious danger of bad performance
Univ. of Tehran Distributed Operating Systems
36
Organizing the Total Organizing the Total SystemSystem
In microkernel organizations, much of the OS is outside the microkernel
But that doesn’t answer the question of how the system as a whole gets organized
How do you fit together the components to build an integrated system? While maintaining all the advantages of the microkernel
Univ. of Tehran Distributed Operating Systems
Micro-ness is in the eye of the beholder
Mach Spring Amoeba Plan 9 Windows NT
Some Important Some Important Microkernel DesignsMicrokernel Designs
Univ. of Tehran Distributed Operating Systems
MachMach Mach didn’t start life as a
microkernel Became one in Mach 3.0
Object-oriented internally Doesn’t force OO at higher levels
Microkernel focus is on communications facilities
Much concern with parallel/distributed systems
Univ. of Tehran Distributed Operating Systems
Mach ModelMach Model
Kernelspace
UserspaceSoftware
emulationlayer
4.3BSDemul.
SysVemul.
HP/UXemul.
otheremul.
Userprocesses
Microkernel
Univ. of Tehran Distributed Operating Systems
What’s In the Mach What’s In the Mach Microkernel?Microkernel?
Tasks & Threads Ports and Port Sets Messages Memory Objects Device Support Multiprocessor/Distributed Support
Univ. of Tehran Distributed Operating Systems
Mach TasksMach Tasks An execution environment providing
basic unit of resource allocation Contains
Virtual address space Port set One or more threads
Univ. of Tehran Distributed Operating Systems
Mach Task ModelMach Task Model
Processport
Bootstrapport
Exceptionport
Registeredports
Addressspace
Thread
Process
Use
r sp
ace
Ker
nel
Univ. of Tehran Distributed Operating Systems
Mach ThreadsMach Threads Basic unit of Mach execution Runs in context of one task All threads in one task share its
resources Unix process similar to Mach task
with single thread
Univ. of Tehran Distributed Operating Systems
Task and Thread Task and Thread SchedulingScheduling
Very flexible Controllable by kernel or user-level
programs Threads of single task can execute in
parallel On single processor Multiple processors
User-level scheduling can extend to multiprocessor scheduling
Univ. of Tehran Distributed Operating Systems
Mach PortsMach Ports Basic Mach object reference mechanism
Kernel-protected communication channel Tasks communicate by sending
messages to ports Threads in receiving tasks pull messages
off a queue Ports are location independent Port queues protected by kernel;
bounded
Univ. of Tehran Distributed Operating Systems
46
Port RightsPort Rights mechanism by which tasks control
who may talk to their ports Kernel prevents messages being set
to a port unless the sender has its port rights
Port rights also control which single task receives on a port
Univ. of Tehran Distributed Operating Systems
47
Port SetsPort Sets A group of ports sharing a common
message queue A thread can receive messages from
a port set Thus servicing multiple ports
Messages are tagged with the actual port
A port can be a member of at most one port set
Univ. of Tehran Distributed Operating Systems
Mach MessagesMach Messages Typed collection of data objects
Unlimited size Sent to particular port May contain actual data or pointer to
data Port rights may be passed in a
message Kernel inspects messages for
particular data types (like port rights)
Univ. of Tehran Distributed Operating Systems
Mach Memory ObjectsMach Memory Objects A source of memory accessible by
tasks May be managed by user-mode
external memory manager a file managed by a file server
Accessed by messages through a port Kernel manages physical memory as
cache of contents of memory objects
Univ. of Tehran Distributed Operating Systems
Mach Device SupportMach Device Support Devices represented by ports Messages control the device and its
data transfer Actual device driver outside the
kernel in an external object
Univ. of Tehran Distributed Operating Systems
Mach Multiprocessor Mach Multiprocessor and DS Supportand DS Support
Messages and ports can extend across processor/machine boundaries Location transparent entities
Kernel manages distributed hardware Per-processor data structures, but also
structures shared across the processors Intermachine messages handled by a
server that knows about network details
Univ. of Tehran Distributed Operating Systems
52
Mach’s NetMsgServerMach’s NetMsgServer User-level capability-based
networking daemon Handles naming and transport for
messages Provides world-wide name service
for ports Messages sent to off-node ports go
through this server
Univ. of Tehran Distributed Operating Systems
53
NetMsgServer in ActionNetMsgServer in Action
User space
Kernel space
Sender
User process
NetMsgServer
User space
Kernel space
Receiver
User process
NetMsgServer
Univ. of Tehran Distributed Operating Systems
Mach and User Mach and User InterfacesInterfaces
Mach was built for the UNIX community UNIX programs don’t know about ports,
messages, threads, and tasks How do UNIX programs run under Mach? Mach typically runs a user-level server
that offers UNIX emulation Either provides UNIX system call
semantics internally or translates it to Mach primitives
Univ. of Tehran Distributed Operating Systems
Windows NTWindows NT More layered than some microkernel
designs NT Microkernel provides base services Executive builds on base services via
modules to provide user-level services User-level services used by
privileged subsystems (parts of OS) true user programs
Univ. of Tehran Distributed Operating Systems
Windows NT DiagramWindows NT Diagram
Hardware
MicrokernelExecutive
UserProcesses
ProtectedSubsystems
User Mode
Kernel Mode
Win32 POSIX
Univ. of Tehran Distributed Operating Systems
NT MicrokernelNT Microkernel Thread scheduling Process switching Exception and interrupt handling Multiprocessor synchronization Only NT part not preemptible or
pageable All other NT components runs in
threads
Univ. of Tehran Distributed Operating Systems
NT ExecutiveNT Executive Higher level services than
microkernel Runs in kernel mode
but separate from the microkernel itself ease of change and expansion
Built of independent modules all preemptible and pageable
Univ. of Tehran Distributed Operating Systems
NT Executive ModulesNT Executive Modules Object manager Security reference monitor Process manager Local procedure call facility (a la
RPC) Virtual memory manager I/O manager
Univ. of Tehran Distributed Operating Systems
Typical Activity in NTTypical Activity in NT
Hardware
KernelExecutive
Client Process
Win32ProtectedSubsystem
Univ. of Tehran Distributed Operating Systems
Windows NT ThreadsWindows NT Threads Executable entity running in an
address space Scheduled by kernel Handled by kernel’s dispatcher Kernel works with stripped-down
view of thread - kernel thread object Multiple process threads can
execute on distinct processors--even Executive ones
Univ. of Tehran Distributed Operating Systems
Microkernel Process Microkernel Process ObjectsObjects
A microkernel proxy for the real process
Microkernel’s interface to the real process
Contains pointers to the various resources owned by the process e.g., threads and address spaces
Alterable only by microkernel calls
Univ. of Tehran Distributed Operating Systems
Microkernel Thread Microkernel Thread ObjectsObjects
As microkernel process objects are proxies for the real object, microkernel thread objects are proxies for the real thread One per thread
Contains minimal information about thread Priorities, dispatching state
Used by the microkernel for dispatching
Univ. of Tehran Distributed Operating Systems
More On MicrokernelsMore On Microkernels Microkernels were the research
architecture of the 80s But few commercial systems of the
90s really use microkernels To some extent, “microkernel” is
now a dirty word in OS design Why?
Univ. of Tehran Distributed Operating Systems
Microkernel Microkernel ConstructionConstruction
Most Microkernels do not perform well Is it inherent in the approach or Implementation?
IPC, microkernel bottleneck, can implemented an order of magnitude faster. Not supervise memory Minimal address space management, grant,
map, flush. Fast kernel-User Switch, usually 20-30 us but
3 in L3 implementation
Univ. of Tehran Distributed Operating Systems
66
ExokernelExokernel Traditional operating systems fix the
interface and implementation of OS abstractions.
Abstractions must be overly general to work with diverse application needs.
FIXED
Hardware
Applications
InterfaceAbstractions
Univ. of Tehran Distributed Operating Systems
67
ExampleExample
FIXED
Hardware
Apache
InterfaceAbstractions
SQL Server
Traditional OS
Univ. of Tehran Distributed Operating Systems
68
The Issues The Issues Performance
Denies applications the advantages of domain-specific optimizations
Flexibility Restricts the flexibility of application
builders Functionality
Discourages changes to the implementations of existing abstractions
Univ. of Tehran Distributed Operating Systems
69
Performance Performance Example: A DB can have predictable data
access patterns, that doesn't fit with OS LRU page replacement, causing bad performance.
Cao et al. Found that application-controlled file caching can reduce running time by as much as 45%.
There is no single way to abstract physical resources or to implement an abstraction that is best for all applications.
OS is forced to make trade-offs Performance improvements of application-
specific policies could be substantial
Univ. of Tehran Distributed Operating Systems
70
FlexibilityFlexibility Fixed high-level abstractions hide
information from applications. Makes it difficult or impossible for
applications to implement their own resource management abstractions.
Univ. of Tehran Distributed Operating Systems
71
FunctionalityFunctionality Only one available interface
between applications and hardware resources.
Because all applications must share one set of abstractions, changes to these abstractions occur rarely, if ever
Univ. of Tehran Distributed Operating Systems
72
The SolutionThe Solution Separate protection from management
Allow user level to manage resources Application libraries implement OS abstractions
Exokernel exports resources Low level interface Protects, does not manage Expose hardware
Univ. of Tehran Distributed Operating Systems
73
Applications know better than Operating Systems what the goal of their resource management decisions should beApplications should be given as much control as possible over those decisionsImplementation view
ExokernelExokernel PhilosophyPhilosophy
Frame Buffer | TLB | Network | Memory | DiskExokernel
HW
Univ. of Tehran Distributed Operating Systems
74
ExampleExample
Hardware
Exokernel – Application level resource management
SQL Server
Library OS Customized for SQLServer
InterfaceAbstractions
Library OSChosen from available
Apache
InterfaceAbstractions
Exokernel
Univ. of Tehran Distributed Operating Systems
75
Library O.S., which uses the low-level exokernel interface to implement higher-level abstractions.
Implementation Implementation OverviewOverview
Frame Buffer | TLB | Network | Memory | DiskExokernel
HW
Library O.S.
Univ. of Tehran Distributed Operating Systems
76
Applications link to library kernel, leveraging their higher-level abstractions.
Implementation Implementation OverviewOverview
Frame Buffer | TLB | Network | Memory | DiskExokernel
HW
Library O.S.
Application
Library O.S.
Application
Univ. of Tehran Distributed Operating Systems
77
End-to-End ArgumentEnd-to-End Argument “if something has to be done by the
user program itself, it is wasteful to do it in a lower level as well.”
Why should the OS do anything that the user program can do itself?
In other words - all an OS should do is securely allocate resources.
Univ. of Tehran Distributed Operating Systems
78
Exokernel designExokernel design
Univ. of Tehran Distributed Operating Systems
79
Exokernel tasksExokernel tasks Track ownership Guard all resources through bind
points Revoke access to resources
Univ. of Tehran Distributed Operating Systems
80
Design principleDesign principle Expose hardware (securely) Expose allocation Expose names Expose revocation
Univ. of Tehran Distributed Operating Systems
81
Secure bindingSecure binding Decouples authorization from use Allows kernel to protect resource without
understanding their semantics Example: TLB entry
Virtual to physical mapping performed in the library (above exokernel)
Binding loaded into the kernel; used multiple times
Example: packet filter Predicates loaded into the kernel Checked on each packet arrival
Univ. of Tehran Distributed Operating Systems
82
Implementing secure Implementing secure bindingsbindings
Hardware mechanisms Capability for physical pages of a file Frame buffer regions (SGI)
Software caching Exokernel large software TLB overlaying
the hardware TLB Downloading code into kernel
Avoid expensive boundary crossings Similar to the SPIN idea
Univ. of Tehran Distributed Operating Systems
83
Examples of secure Examples of secure bindingbinding
Physical memory allocation (hardware supported binding) Library allocates physical page Exokernel records the allocator and the permissions
and returns a “capability” – an encrypted cypher Every access to this page by the library requires this
capability
Page fault:•Kernel fields it•Kicks it up to the library•Library allocated a page – gets an encrypted capability•Library calls the kernel to enter a particular translation into the TLB by presenting the capability
Univ. of Tehran Distributed Operating Systems
84
Download code into kernel to establish secure binding Packet filter for demultiplexing network
packets Exactly similar to SPIN How to ensure authenticity? Only trusted servers (library OS) can
download code into the kernel Other use of downloaded code
Execute code on behalf of an app that is not currently scheduled
E.g. application handler for garbage collection could be installed in the kernel
Univ. of Tehran Distributed Operating Systems
85
Visible resource Visible resource revocationrevocation
Most resources are visibly revoked E.g. processor; physical page Library can then perform necessary
action before relinquishing the resource E.g. needed state saving for a processor E.g. update of page table
Univ. of Tehran Distributed Operating Systems
86
Abort protocolAbort protocol Repossession exception passed to
the library OS Repossession vector
Gives info to the library OS as to what was repossessed so that corrective action can be taken
Library OS can seed the vector to enable exokernel to autosave (e.g. disk blocks to which a physical page being repossessed should be written to)
Univ. of Tehran Distributed Operating Systems
87
Aegis – an exokernelAegis – an exokernel
Univ. of Tehran Distributed Operating Systems
88
Aegis – processor time Aegis – processor time sliceslice
Linear vector of time slots Round robin An application can mark its “position” in
the vector for scheduling Timer interrupt
Beginning and end of time slices Control transferred to library specified handler
for actual saving/restoring Time to save/restore is bounded
Penalty? loss of a time slice next time!
Univ. of Tehran Distributed Operating Systems
89
Aegis – processor Aegis – processor environmentsenvironments
Exception context Program generated
Interrupt context External: e,g. timer
Protected entry context Cross domain calls
Addressing context Guaranteed mappings implemented by
software TLB mimicking the library OS page table
Univ. of Tehran Distributed Operating Systems
90
Aegis performanceAegis performance
Univ. of Tehran Distributed Operating Systems
91
Aegis - Address Aegis - Address translation translation
On TLB miss Kernel installs hardware from software
TLB for guaranteed mappings Otherwise application handler called Application establishes mapping TLB entry with associated capability
presented to the kernel Kernel installs and resumes execution
of the application
Univ. of Tehran Distributed Operating Systems
92
ExOS – library OSExOS – library OS IPC abstraction VM Remote communication using ASH
(application specific safe handlers)
Takeaway:significant performance improvement possible compared to a monolithic implementation
Univ. of Tehran Distributed Operating Systems
93
The ExokernelThe Exokernel A thin veneer that multiplexes and
exports physical resources securely. Simplicity allows efficiency The lower the level of a primitive, the
more efficiently it can be implemented, and the more latitude it grants to implementers of higher level abstractions.
Univ. of Tehran Distributed Operating Systems
94
The ExokernelThe Exokernel Resource management is restricted
to allocation, revocation, sharing ownership tracking
Univ. of Tehran Distributed Operating Systems
95
Library operating Library operating systemssystems
Use the low level exokernel interface Higher level abstractions Special purpose implementations
An application can choose the library which best suits its needs, or even build its own.
Univ. of Tehran Distributed Operating Systems
96
Another ExampleAnother Example
Univ. of Tehran Distributed Operating Systems
97
Design ChallengeDesign ChallengeHow can an Exokernel allow libOSes to
freely manage physical resources while protecting them from each other? Track ownership of resources
Secure bindings – libOS can securely bind to machine resources
Guard all resource usage Revoke access to resources
Univ. of Tehran Distributed Operating Systems
98
Secure BindingsSecure Bindings Exokernel allows libOSes to bind
resources using secure bindings Multiplex resources securely Protection for mutually distrusted apps Efficient
Univ. of Tehran Distributed Operating Systems
99
Secure BindingsSecure Bindings Secure Binding – a protection
mechanism that decouples authorization from actual use of a resource Allows the kernel to protect resources
without having to understand them
Univ. of Tehran Distributed Operating Systems
100
Guard all resource Guard all resource usageusage
Invisible resource revocation-Efficient – application layer not involved -Traditional OS
Visible resource revocation-Allows libOS to guide deallocation and track availability of resources.-Exokernel
Univ. of Tehran Distributed Operating Systems
101
Revoke access to Revoke access to resourcesresources
Abort protocol – Allows exokernel to break secure bindings of an uncooperative libOS by force
Univ. of Tehran Distributed Operating Systems
102
ConclusionConclusion An Exokernel securely multiplexes
available hardware raw hardware among applications
Application level library operating systems implement higher-level traditional OS abstractions
LibOSes can specialize an implementation to suit a particular application
Univ. of Tehran Distributed Operating Systems
103
ConclusionConclusion The lower the level of a primitive…
…the more efficiently it can be implemented… the more latitude it gives to higher level abstractions
So, separate management from protection and……implement protection at a low level
(exokernel)… implement management at a higher level
(libOS)
Univ. of Tehran Distributed Operating Systems
104
Some FeaturesSome Features It is possible to have different
libOSes, for example, one could export a Unix API and another a Windows API
Univ. of Tehran Distributed Operating Systems
105
Exokernel vs. Exokernel vs. MicrokernelMicrokernel
A micro-kernel provides abstractions to the hardware such as files, sockets, graphics etc.
An exokernel provides almost raw access to the hardware.
Univ. of Tehran Distributed Operating Systems
106
Implementation OverviewAllows the extension, specialization, and even replacement of abstractions.
Example: Page Table implementations can vary from libOS to libOS, and applications can choose whichever is most suitable for their needs.
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
107
Implementation PrinciplesProvide libOS'es maximum freedom while protecting them from each other. It is achieved through separation of protection and resource management.
Resources should only be managed to the extent required for protection. LibOS'es handle how best to use resources, with exokernel arbitrating between competing libraries.
LibOS's should be able to request specific physical resources (like specific physical pages).
Resources should not be implicitly allocated; the LibOS should participate in every allocation.
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
108
Secure Bindings Downloading Code Visible Revocation Abort Protocol
Exokernel DesignExokernel Design
Univ. of Tehran Distributed Operating Systems
109
Secure BindingsProtection mechanism that decouples authorization (bind time) from actual use of the resource (access time).
Authorization performed at bind time.Expressed in simple operations that the exokernel can implement quickly and efficiently.
Can protect resources without understanding them. Example:
When a page fault occurs, virtual to physical address mapping is performed, the page is loaded by the exokernel (bind time), and then used multiple times (access time).
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
110
Downloading Code Code can be downloaded into the exokernel, for
execution at defined events (like packet arrival).Reduces kernel crossings.Can execute even when the application isn't scheduled.Can initiate events (e.g. - initiate response message to packet)
Example:A packet filter is downloaded into the exokernel (bind time), and then run on every incoming packet to determine the intended target application (access time), and can even initiate a response.
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
111
Visible Resource Revocation Traditionally, OS's revoke (deallocate) resources
invisibly, without application involvement (e.g. - physical memory).
Advantage: lower latencyDisadvantage: applications cannot guide deallocation
Exokernel uses visible revocation for most resources. The libraryOS is notified of the intention to deallocate, and has the capability of guiding the process.
Example: libOS is told that exokernel will deallocate physical page “5”, it can use this information to update it's page table, or even to suggest a less important page for deallocation.
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
112
Abort Protocol Mechanism to take away resources when libOS's fail to
respond satisfactorily to visible revocation requests. A Repossession Vector is used to keep track of
forcibly deallocated resources. Library OS's can pre-load the vector with information that can be used to write state or data about the resource when it is deallocated (e.g. - define disk blocks for memory paging).
OS's normally require certain allocations to be permanent, so exokernel can guarantee a small number of resources that cannot be forcibly deallocated.Example: page tables, exception areas
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
113
ImplementationAegis: Exokernel
Exports: processor, physical memory, TLB,exceptions, interrupts, and network interface.
ExOS: Library OS Implements: processes, virtual memory, user-
level exceptions, interprocess abstractions, and network protocols (ARP,IP,UDP,NFS)
Compared to Ultrix
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
114
Aegis Processor Time Slices
Time Slices partitioned and allocated at the clock granularity. Scheduled using round robin. Advanced Scheduling can be implemented by libOS through requesting specific positions in the time slices.
Long running apps can allocate contiguous time slices, while interactive apps can allocate several equidistant slices
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
115
Aegis Exceptions Interrupts
Address TranslationsGuarantees address mappings for small number of pages, to simplify boot strapping.
Protected Control Transfers For IPC abstractions Changes program counter to agreed location, sets
appropriate data for context for callee, and donates current time slice.
Dynamic Packet Filter
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
116
ExOSIPC Abstractions
pipe: ExOS uses shared memory buffer, order of magnitude faster than Ultrix, which uses standard unix pipes.
Application Level Virtual Memory150x150 integer matrix mult – doesn't use any special ExOS or Aegis abilities – shows application level VM doesn't incur noticeable overhead (.1 second difference)All other tests performs comparably with Ultrix (reading pages, flipping protection bits, etc...)
Downloaded code for networking handler Round Trip latency for RPC faster than FRPC
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
117
ExOS Extensibility
Extensible Page-Table structures
Implemented inverted page tables
Extensible Schedulers
Stride Scheduling (proportional share scheduling)The processes are succesfully scheduled at a ration of 3:2:1
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
118
Conclusion Experiments with Aegis and ExOS
showSimple exokernel primitives can be implemented efficientlyFast low-level hardware multiplexing can be implemented efficientlyTraditional OS abstractions can be implemented as User LevelApplications can create special-purpose implementations by modifying libraries
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
119
Other Exokernel Work
Porting Multithreading Libraries to an Exokernel SystemErnest Artiaga, Albert Serra, Marisa GilDept. of Computer ArchitectureUniversitat Politecnica de CatalunyaACM SIGOPS European Workshop, ACM 2000, pp. 121-126
Ported Cthreads to Exokernel Slightly faster execution than without threading
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
120
Other Exokernel Work
Fast and Flexible Application-Level Networking on Exokernel SystemGergory Ganger, Dawson Engled, et al.CMU, Stanford, MIT and Vividon, Inc.ACM Transactions on Computer Systems, vol. 20, no. 1, pp. 49--83, 2002
Implemented TCP, HTTP server, and web benchmarking tool
TCP: 50-300% higher throughput HTTP: 3-8 higher throughput Benchmarking: Can produce loads 2-8 times heavier
ExokernelExokernel
Univ. of Tehran Distributed Operating Systems
121
Key points of the paperKey points of the paper Microkernel should provide minimal
abstractions Address space, threads, IPC
Abstractions machine independent but implementation hardware dependent for performance
Myths about inefficiency of micro-kernel stem from inefficient implementation and NOT from microkernel approach
Univ. of Tehran Distributed Operating Systems
122
What abstractions?What abstractions? Determining criterion:
Functionality not performance Hardware and microkernel should be
trusted but applications are not Hardware provides page-based virtual
memory Kernel builds on this to provide protection for
services above and outside the microkernel Principles of independence and integrity
Subsystems independent of one another Integrity of channels between subsystems
protected from other subsystems
Univ. of Tehran Distributed Operating Systems
123
Microkernel ConceptsMicrokernel Concepts
Hardware provides address space mapping from virtual page to a physical page implemented by page tables and TLB
Microkernel concept of address spaces Hides the hardware address spaces and
provides an abstraction that supports Grant? Map? Flush?
These primitives allows building a hierarchy of protected address spaces
Univ. of Tehran Distributed Operating Systems
124
Address spacesAddress spaces
A1, P1 V1, R
map
A2, P2 V2, R
R
(P1, v1)
R
(P1, v1)
(P2, v2)
grant
A2, P2 V2, NILR
(P1, v1)
(P2, v2)
A3, P3 V3, R
(P3, v3)
flush
A3, P3 V3, NILR
(P1, v1)
Univ. of Tehran Distributed Operating Systems
125
Power and flexibility of address spaces Initial memory manager for address space
A0 appears by magic (similar to SPIN core service BUT outside the kernel) and encompasses the physical memory
Allow creation of stackable memory managers (all outside the kernel)
Pagers can be part of a memory manager or outside the memory manager
All address space changes (map, grant, flush) orchestrated via kernel for protection
Device driver can be implemented as a special memory manager outside the kernel as well
Univ. of Tehran Distributed Operating Systems
126
Microkernelprocessor
M0, A0, P0
PT
M1, A1, P1
PT
M2, A2, P2
PT
Map/grant
Univ. of Tehran Distributed Operating Systems
127
Threads and IPCThreads and IPC Executes in an address space
PC, SP, processor registers, and state info (such as address space)
IPC is cross address space communication Supported by the microkernel
Classic method is message passing between threads via the kernel
Sender sends info; receiver decides if it wants to receive it, and if so where
Address space operations such as map, grant, flush need IPC
Higher level communication (e.g. RPC) built on top of basic IPC
Univ. of Tehran Distributed Operating Systems
128
Interrupts? Each hardware device is a thread from kernel’s
perspective Interrupt is a null message from a hardware
thread to the software thread Kernel transforms hardware interrupt into a
message Does not know or care about the semantics of the
interrupt Device specific interrupt handling outside the kernel Clearing hardware state (if privileged) then carried
out by the kernel upon driver thread’s next IPC TLB handler?
In theory software TLB handler can be outside the microkernel
In practice first level TLB handler inside the microkernel or in hardware
Univ. of Tehran Distributed Operating Systems
129
Unique IDsUnique IDs Kernel provides uid over space and
time for Threads IPC channels
Univ. of Tehran Distributed Operating Systems
130
Breaking some Breaking some performance mythsperformance myths
Kernel user switches Address space switches Thread switches and IPC Memory effects
Base system: 486 (50 MHz) – 20 ns cycle time
Univ. of Tehran Distributed Operating Systems
131
Kernel-user switchesKernel-user switches Machine instruction for entering and exiting
107 cycles Mach measures 900 cycles for kernel-user
switch Why?
Empirical proof L3 kernel ~ 123 cycles (accounting for some TLB,
cache misses) Where did the remaining 800 cycles go in
MACH? Kernel overhead (construction of the kernel, and
inherent in the approach)
Univ. of Tehran Distributed Operating Systems
132
Address space switchesAddress space switches Primer on TLBs
AS tagged TLB (MIPS R4000) vs untagged TLB (486)
Untagged TLB requires flush on AS switch Instruction and data caches
Usually physically tagged in most modern processors so TLB flush has no effect
Address space switch Complete reload of Pentium TLB ~ 864
cycles
Univ. of Tehran Distributed Operating Systems
133
Do we need a TLB flush always? Implementation issue of “protection
domains” SPIN implements protection domains as
Modula names within a single hardware address space
Liedtke suggests similar approach in the microkernel in an architecture-specific manner
PowerPC: use segment registers => no flush Pentium or 486: share the linear hardware
address space among several user address spaces => no flush
There are some caveats in terms of size of user space and how many can be “packed” in a 2**32 global space
Univ. of Tehran Distributed Operating Systems
134
Upshot? Address space switching among medium
or small protection domains can ALWAYS be made efficient by careful construction of the microkernel
Large address spaces switches are going to be expensive ALWAYS due to cache effects and TLB effects, so switching cost is not the most critical issue
Univ. of Tehran Distributed Operating Systems
135
Thread switches and Thread switches and IPCIPC
Univ. of Tehran Distributed Operating Systems
136
Segment switch (instead of AS switch) makes cross domain calls cheap
Univ. of Tehran Distributed Operating Systems
137
Memory Effects – Memory Effects – SystemSystem
Univ. of Tehran Distributed Operating Systems
138
Capacity induced MCPICapacity induced MCPI
Univ. of Tehran Distributed Operating Systems
139
Portability Vs. Portability Vs. PerformancePerformance
Microkernel on top of abstract hardware while portable Cannot exploit hardware features Cannot take precautions to avoid
performance problems specific to an arch
Incurs performance penalty due to abstract layer
Univ. of Tehran Distributed Operating Systems
140
Examples of non-Examples of non-portabilityportability
Same processor family Use address space switch implementation
TLB flush method preferable for 486 Segment register switch preferable for Pentium
=> 50% change of microkernel! IPC implementation
Details of the cache layout (associativity) requires different handling of IPC buffers in 486 and Pentium
Incompatible processors Exokernel on R4000 (tagged TLB) Vs. 486
(untagged TLB)
=> Microkernels are inherently non-portable
Univ. of Tehran Distributed Operating Systems
141
SummarySummary Minimal set of abstractions in
microkernel Microkernels are processor specific
(at least in implementation) and non-portable
Right abstractions and processor-specific implementation leads to efficient processor-independent abstractions at higher layers
Univ. of Tehran Distributed Operating Systems
142
Performance
Univ. of Tehran Distributed Operating Systems
143
Key pointsKey points Goal: extensibility akin to SPIN and
Exokernel goals Main difference: support running several
commodity operating systems on the same hardware simultaneously without sacrificing performance or functionality
Why? Application mobility Server consolidation Co-located hosting facilities Distributed web services ….
Univ. of Tehran Distributed Operating Systems
144
Multiprocessor OSMultiprocessor OS Synchronization Communication Scheduling
We have seen these issues already in the other readings in this section of the course
Univ. of Tehran Distributed Operating Systems
145
Key IssuesKey Issues Modern parallel machines
Large system sizes stressing bottlenecks in system software (e.g. global data structures)
Higher memory latencies NUMA effects (i.e. symmetric assumption
does not hold Cache hierarchy
Write sharing expensive due coherence traffic False sharing due to large cache lines
Univ. of Tehran Distributed Operating Systems
146
Thesis of Tornado Thesis of Tornado paperpaper
In designing multiprocessor OS Pay attention to locality Reduce shared system data structures Reduce distance between accessing
processor and target memory module
Univ. of Tehran Distributed Operating Systems
147
Effect of global data Effect of global data structure – shared structure – shared countercounter
Univ. of Tehran Distributed Operating Systems
148
Tornado design Tornado design approachapproach
Object-oriented design for scalability Clustered objects Protected procedure call with a view to preserving
locality while ensuring concurrency Semi automatic garbage collection for localizing locking
OS objects have multiple implementations Low overhead version when scalability is not required Resort to scalable implementation when performance
critical Optimize common case
Object invocation should be fast; object creation/destruction can be slower
Page fault handling should be fast; memory region creation/deletion can be slower
Univ. of Tehran Distributed Operating Systems
149
Next LectureNext Lecture Process and Thread
“Cooperative Task Management Without Manual Stack Management”, by Atul Adya, et.al.
“Capriccio: Scalable Threads for Internet Services”, by Ron Von Behrn, et. al.
“The Performance Implication of Thread Management Alternative for Shared-Memory Multiprocessors”, Thomas E. Anderson, et.al.