univ. of tehrandistributed operating systems1 advanced operating systems university of tehran dept....

Univ. of Tehran Distributed Operating Systems

1

Advanced Advanced

Operating SystemsOperating Systems

University of TehranDept. of EE and Computer Engineering

By:Dr. Nasser Yazdani

Lecture 3: OS designOS design


2

How to design an OSHow to design an OS Some general guides and experiences. References

“The Computer for the 21st Century”, Mark Weiser

“Exokernel: An Operating System Architecture for Application Level Resource Management”, Dawson R., Engler M, Frans Kaashoek, et al.

“On Micro-Kernel Constructions“,


3

OutlineOutline New applications/requirements Organizing operating systems Some microkernel examples Object-oriented organizations

Spring Organization for multiprocessors


4

New visionNew vision Two important problems: location and

scale. Ubiquitous computing: tiny kernels of

functionality Virtual Reality Mobility Intelligent devices distributed computing" make networks

appear like disks, memory, or other nonnetworked devices.


5

Ubiquitous computingUbiquitous computing Transparent computing is the ultimate goal Computers should disappear into the background Computation becomes part of the environment Computing everywhere

Desktop, Laptop, Palmtop Cars, Cell phones Shoes, Clothing, Walls (paper / paint)

Connectivity everywhere Broadband Wireless

Mobile everywhere Users move around Disposable devices


6

Ubiquitous ComputingUbiquitous Computing Structure

Resource and service discovery critical User location an issue Interface discovery Disconnected operation Ad-hoc organization

Security Small devices with limited power Intermittent connectivity

Agents Sensor Networks


7

Grid ComputingGrid Computing Federated system

No single controlling authority Scheduling

Processors, bandwidth and other resources Policy is an important issue

Reliability, security, of who can use, and what one is willing to use.

Systems Globus toolkit Condor Related but not grid – CORBA, DCOM, DCE

Applications Distributed supercomputing


Peer-to-Peer ComputingPeer-to-Peer Computing Locating Cooperative elements Scalability OS support Security Policies


9

P2P File Sharing IssuesP2P File Sharing Issues Naming Data discovery Availability Security

Encryption Fault tolerance

Conflict resolution Replication


10

Other Peer to Peer Other Peer to Peer TechnologiesTechnologies

Ad-hoc networking Untrusted nodes used to relay messages Multiple routes (distributed and replicated) Extends range, reduces power, increases

aggregate bandwidth. Increases latency, management more

difficult.

Sensor networks An application of ad-hoc networking Add processing/reduction in the network


11

What is the big deal?What is the big deal? Performance Border crossings are expensive

Change in locality Copying between user and kernel

buffers Application requirements differ in terms

of resource management


12

Operating System Operating System OrganizationOrganization

What is the best way to design an operating system?

Put another way, what are the important software characteristics of an OS?

What should be in OS kernel or application or partitioning. Is there a minimal set for kernel?


13

Important OS Software Important OS Software CharacteristicsCharacteristics

Correctness and simplicity Power and completeness Performance Extensibility and portability

Flexibility Scalability

Suitability for distributed and parallel systems

Compatibility with existing systems Security and fault tolerance


Common OS Common OS OrganizationsOrganizations

Monolithic Virtual machine Layered designs Kernel designs Microkernels Object-Oriented Note that individual OS components can

be organized these ways Trade off between generality and

specialization


15

What are we shooting What are we shooting for?for?

OS should be thin (like a microkernel) providing only mechanisms not embodying policies (i.e. management)

Fine grain access to system resources while avoiding border crossings as much as possible (like DOS)

Allow flexible extensions for management of resources (like a microkernel) without sacrificing safety (like a monolithic kernel)


Monolithic OS DesignMonolithic OS Design Build OS as single combined module

Hopefully using data abstraction, compartmentalized function, etc.

OS lives in its own, single address space

Examples DOS early Unix systems most VFS file systems


Pros/Cons of Monolithic Pros/Cons of Monolithic OS OrganizationOS Organization

+ Highly adaptable (at first . . .)+ Little planning required+ Potentially good performance– Hard to extend and change– Eventually becomes extremely

complex– Eventually performance becomes

poor– Highly prone to bugs


Virtual Machine Virtual Machine OrganizationsOrganizations

A base operating system provides services in a very generic way

One or more other operating systems live on top of the base system Using the services it provides To offer different views of system to users

Examples - IBM’s VM/370, the Java interpreter


Pros/Cons of Virtual Pros/Cons of Virtual Machine OrganizationsMachine Organizations

+ Allows multiple OS personalities on a single machine

+ Good OS development environment+ Can provide good portability of

applications– Significant performance problems– Especially if more than 2 layers– Lacking in flexibility


20

Old ideaOld idea VM 370

Virtualization for binary support for legacy apps

Why resurgence today? Companies want a share of everybody’s pie

IBM zSeries “mainframes” support virtualization for server consolidation

Enables billing and performance isolation while hosting several customers

Microsoft has announced virtualization plans to allow easy upgrades and hosting Linux!

You can see the dots connecting up From extensibility (a la SPIN) to virtualization


21

Possible virtualization Possible virtualization approachesapproaches

Standard OS (such as Linux, Windows) Meta services (such as grid) for users to install

files and run processes Administration, accountability, and performance

isolation become hard Retrofit performance isolation into OSs

Linux/RK, QLinux, SILK Accounting resource usage correctly can be an

issue unless done at the lowest level (e.g. Exokernel)

Xen approach Multiplex physical resource at OS granularity


22

Full virtualizationFull virtualization Virtual hardware identical to real one

Relies on hosted OS trapping to the VMM for privileged instructions

Pros: run unmodified OS binary on top Cons:

supervisor instructions can fail silently in some hardware platforms (e.g. x86)

Solution in VMware: Dynamically rewrite portions of the hosted OS to insert traps

need for hosted OS to see real resources: real time, page coloring tricks for optimizing performance, etc…


23

Xen principlesXen principles Support for unmodified application

binaries Support for multi-application OS

Complex server configuration within a single OS instance

Paravirtualization for strong resource isolation on uncooperative hardware (x86)

Paravirtualization to enable optimizing guest OS performance and correctness


24

Xen: VM managementXen: VM management What would make VM virtualization easy

Software TLB Tagged TLB =>no TLB flush on context

switchX86 does not have either

Xen approach Guest OS responsible for allocating and

managing hardware PT Xen top 64MB of every address space.

Why?


Layered OS DesignLayered OS Design

Design tiny innermost layer of software Next layer out provides more functionality

Using services provided by inner layer Continue adding layers until all

functionality required has been provided Examples

Multics Fluke layered file systems and comm. protocols


Pros/Cons of Layered Pros/Cons of Layered OrganizationOrganization

+ More structured and extensible+ Easy model and development– Performance: Layer crossing can be

expensive– In some cases, unnecessary layers,

duplicated functionality.


Kernel OS DesignsKernel OS Designs Similar to layers, but only two OS layers

Kernel OS services Non-kernel OS services

Move certain functionality outside kernel file systems, libraries

Unlike virtual machines, kernel doesn’t stand alone

Examples - Most modern Unix systems


Pros/Cons of Kernel OS Pros/Cons of Kernel OS OrganizationOrganization

+ Many advantages of layering, without disadvantage of too many layers

+ Easier to demonstrate correctness– Not as general as layering– Offers no organizing principle for

other parts of OS, user services– Kernels tend to grow to monoliths


Object-Oriented OS Object-Oriented OS DesignDesign

Design internals of OS as set of privileged objects, using OO methods

Sometimes extended into application space

Tends to lead to client/server style of computing

Examples Mach (internally) Spring (totally)


30

Object-Oriented Object-Oriented OrganizationsOrganizations

Object-oriented organization is increasingly popular

Well suited to OS development, in some ways OSes manage important data

structures OSes are modularizable Strong interfaces are good in OSes


31

Object-Orientation and Object-Orientation and ExtensibilityExtensibility

One of the main advantages of object-oriented programming is extensibility

Operating systems increasingly need extensibility

So, again, object-oriented techniques are a good match for operating system design


32

How object-oriented How object-oriented should an OS be?should an OS be?

Many OSes have been built with object-oriented techniques E.g., Mach and Windows NT

But most of them leave object orientation at the microkernel boundary No attempt to force object orientation

on out-of-kernel modules


Pros/Cons of Object Pros/Cons of Object Oriented OS Oriented OS OrganizationOrganization+ Offers organizational model for entire system

+ Easily divides system into pieces+ Good hooks for security– Can be a limiting model– Must watch for performance

problemsNot widely used yet


Microkernel OS DesignMicrokernel OS Design Like kernels, only less number of abstractions

exported (threads, address space, communication channel)

Try to include only small set of required services in the microkernel

Moves even more out of innermost OS part Like parts of VM, IPC, paging, etc.

System services (e.g. VM manager) implemented as servers on top

High comm overhead between services implemented at user level and microkernel limits extensibility in practice

Examples - Mach, Amoeba, Plan 9, Windows NT, Chorus, Spring, etc.


Pros/Cons of Pros/Cons of Microkernel Microkernel OrganizationOrganization+ Those of kernels, plus:

+ Minimizes code for most important OS services

+ Offers model for entire system– Microkernels tend to grow into

kernels– Requires very careful initial design

choices– Serious danger of bad performance


36

Organizing the Total Organizing the Total SystemSystem

In microkernel organizations, much of the OS is outside the microkernel

But that doesn’t answer the question of how the system as a whole gets organized

How do you fit together the components to build an integrated system? While maintaining all the advantages of the microkernel


Micro-ness is in the eye of the beholder

Mach Spring Amoeba Plan 9 Windows NT

Some Important Some Important Microkernel DesignsMicrokernel Designs


MachMach Mach didn’t start life as a

microkernel Became one in Mach 3.0

Object-oriented internally Doesn’t force OO at higher levels

Microkernel focus is on communications facilities

Much concern with parallel/distributed systems


Mach ModelMach Model

Kernelspace

UserspaceSoftware

emulationlayer

4.3BSDemul.

SysVemul.

HP/UXemul.

otheremul.

Userprocesses

Microkernel


What’s In the Mach What’s In the Mach Microkernel?Microkernel?

Tasks & Threads Ports and Port Sets Messages Memory Objects Device Support Multiprocessor/Distributed Support


Mach TasksMach Tasks An execution environment providing

basic unit of resource allocation Contains

Virtual address space Port set One or more threads


Mach Task ModelMach Task Model

Processport

Bootstrapport

Exceptionport

Registeredports

Addressspace

Thread

Process

Use

r sp

ace

Ker

nel


Mach ThreadsMach Threads Basic unit of Mach execution Runs in context of one task All threads in one task share its

resources Unix process similar to Mach task

with single thread


Task and Thread Task and Thread SchedulingScheduling

Very flexible Controllable by kernel or user-level

programs Threads of single task can execute in

parallel On single processor Multiple processors

User-level scheduling can extend to multiprocessor scheduling


Mach PortsMach Ports Basic Mach object reference mechanism

Kernel-protected communication channel Tasks communicate by sending

messages to ports Threads in receiving tasks pull messages

off a queue Ports are location independent Port queues protected by kernel;

bounded


46

Port RightsPort Rights mechanism by which tasks control

who may talk to their ports Kernel prevents messages being set

to a port unless the sender has its port rights

Port rights also control which single task receives on a port


47

Port SetsPort Sets A group of ports sharing a common

message queue A thread can receive messages from

a port set Thus servicing multiple ports

Messages are tagged with the actual port

A port can be a member of at most one port set


Mach MessagesMach Messages Typed collection of data objects

Unlimited size Sent to particular port May contain actual data or pointer to

data Port rights may be passed in a

message Kernel inspects messages for

particular data types (like port rights)


Mach Memory ObjectsMach Memory Objects A source of memory accessible by

tasks May be managed by user-mode

external memory manager a file managed by a file server

Accessed by messages through a port Kernel manages physical memory as

cache of contents of memory objects


Mach Device SupportMach Device Support Devices represented by ports Messages control the device and its

data transfer Actual device driver outside the

kernel in an external object


Mach Multiprocessor Mach Multiprocessor and DS Supportand DS Support

Messages and ports can extend across processor/machine boundaries Location transparent entities

Kernel manages distributed hardware Per-processor data structures, but also

structures shared across the processors Intermachine messages handled by a

server that knows about network details


52

Mach’s NetMsgServerMach’s NetMsgServer User-level capability-based

networking daemon Handles naming and transport for

messages Provides world-wide name service

for ports Messages sent to off-node ports go

through this server


53

NetMsgServer in ActionNetMsgServer in Action

User space

Kernel space

Sender

User process

NetMsgServer

User space

Kernel space

Receiver

User process

NetMsgServer


Mach and User Mach and User InterfacesInterfaces

Mach was built for the UNIX community UNIX programs don’t know about ports,

messages, threads, and tasks How do UNIX programs run under Mach? Mach typically runs a user-level server

that offers UNIX emulation Either provides UNIX system call

semantics internally or translates it to Mach primitives


Windows NTWindows NT More layered than some microkernel

designs NT Microkernel provides base services Executive builds on base services via

modules to provide user-level services User-level services used by

privileged subsystems (parts of OS) true user programs


Windows NT DiagramWindows NT Diagram

Hardware

MicrokernelExecutive

UserProcesses

ProtectedSubsystems

User Mode

Kernel Mode

Win32 POSIX


NT MicrokernelNT Microkernel Thread scheduling Process switching Exception and interrupt handling Multiprocessor synchronization Only NT part not preemptible or

pageable All other NT components runs in

threads


NT ExecutiveNT Executive Higher level services than

microkernel Runs in kernel mode

but separate from the microkernel itself ease of change and expansion

Built of independent modules all preemptible and pageable


NT Executive ModulesNT Executive Modules Object manager Security reference monitor Process manager Local procedure call facility (a la

RPC) Virtual memory manager I/O manager


Typical Activity in NTTypical Activity in NT

Hardware

KernelExecutive

Client Process

Win32ProtectedSubsystem


Windows NT ThreadsWindows NT Threads Executable entity running in an

address space Scheduled by kernel Handled by kernel’s dispatcher Kernel works with stripped-down

view of thread - kernel thread object Multiple process threads can

execute on distinct processors--even Executive ones


Microkernel Process Microkernel Process ObjectsObjects

A microkernel proxy for the real process

Microkernel’s interface to the real process

Contains pointers to the various resources owned by the process e.g., threads and address spaces

Alterable only by microkernel calls


Microkernel Thread Microkernel Thread ObjectsObjects

As microkernel process objects are proxies for the real object, microkernel thread objects are proxies for the real thread One per thread

Contains minimal information about thread Priorities, dispatching state

Used by the microkernel for dispatching


More On MicrokernelsMore On Microkernels Microkernels were the research

architecture of the 80s But few commercial systems of the

90s really use microkernels To some extent, “microkernel” is

now a dirty word in OS design Why?


Microkernel Microkernel ConstructionConstruction

Most Microkernels do not perform well Is it inherent in the approach or Implementation?

IPC, microkernel bottleneck, can implemented an order of magnitude faster. Not supervise memory Minimal address space management, grant,

map, flush. Fast kernel-User Switch, usually 20-30 us but

3 in L3 implementation


66

ExokernelExokernel Traditional operating systems fix the

interface and implementation of OS abstractions.

Abstractions must be overly general to work with diverse application needs.

FIXED

Hardware

Applications

InterfaceAbstractions


67

ExampleExample

FIXED

Hardware

Apache


SQL Server

Traditional OS


68

The Issues The Issues Performance

Denies applications the advantages of domain-specific optimizations

Flexibility Restricts the flexibility of application

builders Functionality

Discourages changes to the implementations of existing abstractions


69

Performance Performance Example: A DB can have predictable data

access patterns, that doesn't fit with OS LRU page replacement, causing bad performance.

Cao et al. Found that application-controlled file caching can reduce running time by as much as 45%.

There is no single way to abstract physical resources or to implement an abstraction that is best for all applications.

OS is forced to make trade-offs Performance improvements of application-

specific policies could be substantial


70

FlexibilityFlexibility Fixed high-level abstractions hide

information from applications. Makes it difficult or impossible for

applications to implement their own resource management abstractions.


71

FunctionalityFunctionality Only one available interface

between applications and hardware resources.

Because all applications must share one set of abstractions, changes to these abstractions occur rarely, if ever


72

The SolutionThe Solution Separate protection from management

Allow user level to manage resources Application libraries implement OS abstractions

Exokernel exports resources Low level interface Protects, does not manage Expose hardware


73

Applications know better than Operating Systems what the goal of their resource management decisions should beApplications should be given as much control as possible over those decisionsImplementation view

ExokernelExokernel PhilosophyPhilosophy

Frame Buffer | TLB | Network | Memory | DiskExokernel

HW


74

ExampleExample

Hardware

Exokernel – Application level resource management

SQL Server

Library OS Customized for SQLServer


Library OSChosen from available

Apache


Exokernel


75

Library O.S., which uses the low-level exokernel interface to implement higher-level abstractions.

Implementation Implementation OverviewOverview


HW

Library O.S.


76

Applications link to library kernel, leveraging their higher-level abstractions.

Implementation Implementation OverviewOverview


HW

Library O.S.

Application

Library O.S.

Application


77

End-to-End ArgumentEnd-to-End Argument “if something has to be done by the

user program itself, it is wasteful to do it in a lower level as well.”

Why should the OS do anything that the user program can do itself?

In other words - all an OS should do is securely allocate resources.


78

Exokernel designExokernel design


79

Exokernel tasksExokernel tasks Track ownership Guard all resources through bind

points Revoke access to resources


80

Design principleDesign principle Expose hardware (securely) Expose allocation Expose names Expose revocation


81

Secure bindingSecure binding Decouples authorization from use Allows kernel to protect resource without

understanding their semantics Example: TLB entry

Virtual to physical mapping performed in the library (above exokernel)

Binding loaded into the kernel; used multiple times

Example: packet filter Predicates loaded into the kernel Checked on each packet arrival


82

Implementing secure Implementing secure bindingsbindings

Hardware mechanisms Capability for physical pages of a file Frame buffer regions (SGI)

Software caching Exokernel large software TLB overlaying

the hardware TLB Downloading code into kernel

Avoid expensive boundary crossings Similar to the SPIN idea


83

Examples of secure Examples of secure bindingbinding

Physical memory allocation (hardware supported binding) Library allocates physical page Exokernel records the allocator and the permissions

and returns a “capability” – an encrypted cypher Every access to this page by the library requires this

capability

Page fault:•Kernel fields it•Kicks it up to the library•Library allocated a page – gets an encrypted capability•Library calls the kernel to enter a particular translation into the TLB by presenting the capability


84

Download code into kernel to establish secure binding Packet filter for demultiplexing network

packets Exactly similar to SPIN How to ensure authenticity? Only trusted servers (library OS) can

download code into the kernel Other use of downloaded code

Execute code on behalf of an app that is not currently scheduled

E.g. application handler for garbage collection could be installed in the kernel


85

Visible resource Visible resource revocationrevocation

Most resources are visibly revoked E.g. processor; physical page Library can then perform necessary

action before relinquishing the resource E.g. needed state saving for a processor E.g. update of page table


86

Abort protocolAbort protocol Repossession exception passed to

the library OS Repossession vector

Gives info to the library OS as to what was repossessed so that corrective action can be taken

Library OS can seed the vector to enable exokernel to autosave (e.g. disk blocks to which a physical page being repossessed should be written to)


87

Aegis – an exokernelAegis – an exokernel


88

Aegis – processor time Aegis – processor time sliceslice

Linear vector of time slots Round robin An application can mark its “position” in

the vector for scheduling Timer interrupt

Beginning and end of time slices Control transferred to library specified handler

for actual saving/restoring Time to save/restore is bounded

Penalty? loss of a time slice next time!


89

Aegis – processor Aegis – processor environmentsenvironments

Exception context Program generated

Interrupt context External: e,g. timer

Protected entry context Cross domain calls

Addressing context Guaranteed mappings implemented by

software TLB mimicking the library OS page table


90

Aegis performanceAegis performance


91

Aegis - Address Aegis - Address translation translation

On TLB miss Kernel installs hardware from software

TLB for guaranteed mappings Otherwise application handler called Application establishes mapping TLB entry with associated capability

presented to the kernel Kernel installs and resumes execution

of the application


92

ExOS – library OSExOS – library OS IPC abstraction VM Remote communication using ASH

(application specific safe handlers)

Takeaway:significant performance improvement possible compared to a monolithic implementation


93

The ExokernelThe Exokernel A thin veneer that multiplexes and

exports physical resources securely. Simplicity allows efficiency The lower the level of a primitive, the

more efficiently it can be implemented, and the more latitude it grants to implementers of higher level abstractions.


94

The ExokernelThe Exokernel Resource management is restricted

to allocation, revocation, sharing ownership tracking


95

Library operating Library operating systemssystems

Use the low level exokernel interface Higher level abstractions Special purpose implementations

An application can choose the library which best suits its needs, or even build its own.


96

Another ExampleAnother Example


97

Design ChallengeDesign ChallengeHow can an Exokernel allow libOSes to

freely manage physical resources while protecting them from each other? Track ownership of resources

Secure bindings – libOS can securely bind to machine resources

Guard all resource usage Revoke access to resources


98

Secure BindingsSecure Bindings Exokernel allows libOSes to bind

resources using secure bindings Multiplex resources securely Protection for mutually distrusted apps Efficient


99

Secure BindingsSecure Bindings Secure Binding – a protection

mechanism that decouples authorization from actual use of a resource Allows the kernel to protect resources

without having to understand them


100

Guard all resource Guard all resource usageusage

Invisible resource revocation-Efficient – application layer not involved -Traditional OS

Visible resource revocation-Allows libOS to guide deallocation and track availability of resources.-Exokernel


101

Revoke access to Revoke access to resourcesresources

Abort protocol – Allows exokernel to break secure bindings of an uncooperative libOS by force


102

ConclusionConclusion An Exokernel securely multiplexes

available hardware raw hardware among applications

Application level library operating systems implement higher-level traditional OS abstractions

LibOSes can specialize an implementation to suit a particular application


103

ConclusionConclusion The lower the level of a primitive…

…the more efficiently it can be implemented… the more latitude it gives to higher level abstractions

So, separate management from protection and……implement protection at a low level

(exokernel)… implement management at a higher level

(libOS)


104

Some FeaturesSome Features It is possible to have different

libOSes, for example, one could export a Unix API and another a Windows API


105

Exokernel vs. Exokernel vs. MicrokernelMicrokernel

A micro-kernel provides abstractions to the hardware such as files, sockets, graphics etc.

An exokernel provides almost raw access to the hardware.


106

Implementation OverviewAllows the extension, specialization, and even replacement of abstractions.

Example: Page Table implementations can vary from libOS to libOS, and applications can choose whichever is most suitable for their needs.

ExokernelExokernel


107

Implementation PrinciplesProvide libOS'es maximum freedom while protecting them from each other. It is achieved through separation of protection and resource management.

Resources should only be managed to the extent required for protection. LibOS'es handle how best to use resources, with exokernel arbitrating between competing libraries.

LibOS's should be able to request specific physical resources (like specific physical pages).

Resources should not be implicitly allocated; the LibOS should participate in every allocation.

ExokernelExokernel


108

Secure Bindings Downloading Code Visible Revocation Abort Protocol

Exokernel DesignExokernel Design


109

Secure BindingsProtection mechanism that decouples authorization (bind time) from actual use of the resource (access time).

Authorization performed at bind time.Expressed in simple operations that the exokernel can implement quickly and efficiently.

Can protect resources without understanding them. Example:

When a page fault occurs, virtual to physical address mapping is performed, the page is loaded by the exokernel (bind time), and then used multiple times (access time).

ExokernelExokernel


110

Downloading Code Code can be downloaded into the exokernel, for

execution at defined events (like packet arrival).Reduces kernel crossings.Can execute even when the application isn't scheduled.Can initiate events (e.g. - initiate response message to packet)

Example:A packet filter is downloaded into the exokernel (bind time), and then run on every incoming packet to determine the intended target application (access time), and can even initiate a response.

ExokernelExokernel


111

Visible Resource Revocation Traditionally, OS's revoke (deallocate) resources

invisibly, without application involvement (e.g. - physical memory).

Advantage: lower latencyDisadvantage: applications cannot guide deallocation

Exokernel uses visible revocation for most resources. The libraryOS is notified of the intention to deallocate, and has the capability of guiding the process.

Example: libOS is told that exokernel will deallocate physical page “5”, it can use this information to update it's page table, or even to suggest a less important page for deallocation.

ExokernelExokernel


112

Abort Protocol Mechanism to take away resources when libOS's fail to

respond satisfactorily to visible revocation requests. A Repossession Vector is used to keep track of

forcibly deallocated resources. Library OS's can pre-load the vector with information that can be used to write state or data about the resource when it is deallocated (e.g. - define disk blocks for memory paging).

OS's normally require certain allocations to be permanent, so exokernel can guarantee a small number of resources that cannot be forcibly deallocated.Example: page tables, exception areas

ExokernelExokernel


113

ImplementationAegis: Exokernel

Exports: processor, physical memory, TLB,exceptions, interrupts, and network interface.

ExOS: Library OS Implements: processes, virtual memory, user-

level exceptions, interprocess abstractions, and network protocols (ARP,IP,UDP,NFS)

Compared to Ultrix

ExokernelExokernel


114

Aegis Processor Time Slices

Time Slices partitioned and allocated at the clock granularity. Scheduled using round robin. Advanced Scheduling can be implemented by libOS through requesting specific positions in the time slices.

Long running apps can allocate contiguous time slices, while interactive apps can allocate several equidistant slices

ExokernelExokernel


115

Aegis Exceptions Interrupts

Address TranslationsGuarantees address mappings for small number of pages, to simplify boot strapping.

Protected Control Transfers For IPC abstractions Changes program counter to agreed location, sets

appropriate data for context for callee, and donates current time slice.

Dynamic Packet Filter

ExokernelExokernel


116

ExOSIPC Abstractions

pipe: ExOS uses shared memory buffer, order of magnitude faster than Ultrix, which uses standard unix pipes.

Application Level Virtual Memory150x150 integer matrix mult – doesn't use any special ExOS or Aegis abilities – shows application level VM doesn't incur noticeable overhead (.1 second difference)All other tests performs comparably with Ultrix (reading pages, flipping protection bits, etc...)

Downloaded code for networking handler Round Trip latency for RPC faster than FRPC

ExokernelExokernel


117

ExOS Extensibility

Extensible Page-Table structures

Implemented inverted page tables

Extensible Schedulers

Stride Scheduling (proportional share scheduling)The processes are succesfully scheduled at a ration of 3:2:1

ExokernelExokernel


118

Conclusion Experiments with Aegis and ExOS

showSimple exokernel primitives can be implemented efficientlyFast low-level hardware multiplexing can be implemented efficientlyTraditional OS abstractions can be implemented as User LevelApplications can create special-purpose implementations by modifying libraries

ExokernelExokernel


119

Other Exokernel Work

Porting Multithreading Libraries to an Exokernel SystemErnest Artiaga, Albert Serra, Marisa GilDept. of Computer ArchitectureUniversitat Politecnica de CatalunyaACM SIGOPS European Workshop, ACM 2000, pp. 121-126

Ported Cthreads to Exokernel Slightly faster execution than without threading

ExokernelExokernel


120

Other Exokernel Work

Fast and Flexible Application-Level Networking on Exokernel SystemGergory Ganger, Dawson Engled, et al.CMU, Stanford, MIT and Vividon, Inc.ACM Transactions on Computer Systems, vol. 20, no. 1, pp. 49--83, 2002

Implemented TCP, HTTP server, and web benchmarking tool

TCP: 50-300% higher throughput HTTP: 3-8 higher throughput Benchmarking: Can produce loads 2-8 times heavier

ExokernelExokernel


121

Key points of the paperKey points of the paper Microkernel should provide minimal

abstractions Address space, threads, IPC

Abstractions machine independent but implementation hardware dependent for performance

Myths about inefficiency of micro-kernel stem from inefficient implementation and NOT from microkernel approach


122

What abstractions?What abstractions? Determining criterion:

Functionality not performance Hardware and microkernel should be

trusted but applications are not Hardware provides page-based virtual

memory Kernel builds on this to provide protection for

services above and outside the microkernel Principles of independence and integrity

Subsystems independent of one another Integrity of channels between subsystems

protected from other subsystems


123

Microkernel ConceptsMicrokernel Concepts

Hardware provides address space mapping from virtual page to a physical page implemented by page tables and TLB

Microkernel concept of address spaces Hides the hardware address spaces and

provides an abstraction that supports Grant? Map? Flush?

These primitives allows building a hierarchy of protected address spaces


124

Address spacesAddress spaces

A1, P1 V1, R

map

A2, P2 V2, R

R

(P1, v1)

R

(P1, v1)

(P2, v2)

grant

A2, P2 V2, NILR

(P1, v1)

(P2, v2)

A3, P3 V3, R

(P3, v3)

flush

A3, P3 V3, NILR

(P1, v1)


125

Power and flexibility of address spaces Initial memory manager for address space

A0 appears by magic (similar to SPIN core service BUT outside the kernel) and encompasses the physical memory

Allow creation of stackable memory managers (all outside the kernel)

Pagers can be part of a memory manager or outside the memory manager

All address space changes (map, grant, flush) orchestrated via kernel for protection

Device driver can be implemented as a special memory manager outside the kernel as well


126

Microkernelprocessor

M0, A0, P0

PT

M1, A1, P1

PT

M2, A2, P2

PT

Map/grant


127

Threads and IPCThreads and IPC Executes in an address space

PC, SP, processor registers, and state info (such as address space)

IPC is cross address space communication Supported by the microkernel

Classic method is message passing between threads via the kernel

Sender sends info; receiver decides if it wants to receive it, and if so where

Address space operations such as map, grant, flush need IPC

Higher level communication (e.g. RPC) built on top of basic IPC


128

Interrupts? Each hardware device is a thread from kernel’s

perspective Interrupt is a null message from a hardware

thread to the software thread Kernel transforms hardware interrupt into a

message Does not know or care about the semantics of the

interrupt Device specific interrupt handling outside the kernel Clearing hardware state (if privileged) then carried

out by the kernel upon driver thread’s next IPC TLB handler?

In theory software TLB handler can be outside the microkernel

In practice first level TLB handler inside the microkernel or in hardware


129

Unique IDsUnique IDs Kernel provides uid over space and

time for Threads IPC channels


130

Breaking some Breaking some performance mythsperformance myths

Kernel user switches Address space switches Thread switches and IPC Memory effects

Base system: 486 (50 MHz) – 20 ns cycle time


131

Kernel-user switchesKernel-user switches Machine instruction for entering and exiting

107 cycles Mach measures 900 cycles for kernel-user

switch Why?

Empirical proof L3 kernel ~ 123 cycles (accounting for some TLB,

cache misses) Where did the remaining 800 cycles go in

MACH? Kernel overhead (construction of the kernel, and

inherent in the approach)


132

Address space switchesAddress space switches Primer on TLBs

AS tagged TLB (MIPS R4000) vs untagged TLB (486)

Untagged TLB requires flush on AS switch Instruction and data caches

Usually physically tagged in most modern processors so TLB flush has no effect

Address space switch Complete reload of Pentium TLB ~ 864

cycles


133

Do we need a TLB flush always? Implementation issue of “protection

domains” SPIN implements protection domains as

Modula names within a single hardware address space

Liedtke suggests similar approach in the microkernel in an architecture-specific manner

PowerPC: use segment registers => no flush Pentium or 486: share the linear hardware

address space among several user address spaces => no flush

There are some caveats in terms of size of user space and how many can be “packed” in a 2**32 global space


134

Upshot? Address space switching among medium

or small protection domains can ALWAYS be made efficient by careful construction of the microkernel

Large address spaces switches are going to be expensive ALWAYS due to cache effects and TLB effects, so switching cost is not the most critical issue


135

Thread switches and Thread switches and IPCIPC


136

Segment switch (instead of AS switch) makes cross domain calls cheap


137

Memory Effects – Memory Effects – SystemSystem


138

Capacity induced MCPICapacity induced MCPI


139

Portability Vs. Portability Vs. PerformancePerformance

Microkernel on top of abstract hardware while portable Cannot exploit hardware features Cannot take precautions to avoid

performance problems specific to an arch

Incurs performance penalty due to abstract layer


140

Examples of non-Examples of non-portabilityportability

Same processor family Use address space switch implementation

TLB flush method preferable for 486 Segment register switch preferable for Pentium

=> 50% change of microkernel! IPC implementation

Details of the cache layout (associativity) requires different handling of IPC buffers in 486 and Pentium

Incompatible processors Exokernel on R4000 (tagged TLB) Vs. 486

(untagged TLB)

=> Microkernels are inherently non-portable


141

SummarySummary Minimal set of abstractions in

microkernel Microkernels are processor specific

(at least in implementation) and non-portable

Right abstractions and processor-specific implementation leads to efficient processor-independent abstractions at higher layers


142

Performance


143

Key pointsKey points Goal: extensibility akin to SPIN and

Exokernel goals Main difference: support running several

commodity operating systems on the same hardware simultaneously without sacrificing performance or functionality

Why? Application mobility Server consolidation Co-located hosting facilities Distributed web services ….


144

Multiprocessor OSMultiprocessor OS Synchronization Communication Scheduling

We have seen these issues already in the other readings in this section of the course


145

Key IssuesKey Issues Modern parallel machines

Large system sizes stressing bottlenecks in system software (e.g. global data structures)

Higher memory latencies NUMA effects (i.e. symmetric assumption

does not hold Cache hierarchy

Write sharing expensive due coherence traffic False sharing due to large cache lines


146

Thesis of Tornado Thesis of Tornado paperpaper

In designing multiprocessor OS Pay attention to locality Reduce shared system data structures Reduce distance between accessing

processor and target memory module


147

Effect of global data Effect of global data structure – shared structure – shared countercounter


148

Tornado design Tornado design approachapproach

Object-oriented design for scalability Clustered objects Protected procedure call with a view to preserving

locality while ensuring concurrency Semi automatic garbage collection for localizing locking

OS objects have multiple implementations Low overhead version when scalability is not required Resort to scalable implementation when performance

critical Optimize common case

Object invocation should be fast; object creation/destruction can be slower

Page fault handling should be fast; memory region creation/deletion can be slower


149

Next LectureNext Lecture Process and Thread

“Cooperative Task Management Without Manual Stack Management”, by Atul Adya, et.al.

“Capriccio: Scalable Threads for Internet Services”, by Ron Von Behrn, et. al.

“The Performance Implication of Thread Management Alternative for Shared-Memory Multiprocessors”, Thomas E. Anderson, et.al.

univ. of tehrandistributed operating systems1 advanced operating systems university of tehran dept....

Documents

os design slide

os development

parts of os

kernel file systems

kernel doesnt

objectoriented techniques

object orientation

microkernel constructions