Real World Multicore Embedded Systems || System Virtualization in Multicore Systems

Download Real World Multicore Embedded Systems || System Virtualization in Multicore Systems

Post on 08-Dec-2016




1 download


CHAPTER 7System Virtualization in MulticoreSystemsDavid KleidermacherChief Technology Officer, Green Hills Software, Inc., Santa Barbara, CA, USAChapter OutlineWhat is virtualization? 228A brief retrospective 230Applications of system virtualization 231Environment sandboxing 231Virtual appliances 232Workload consolidation 232Operating system portability 233Mixed-criticality systems 233Maximizing multicore processing resources 233Improved user experience 233Hypervisor architectures 234Type 2 234Type 1 235Paravirtualization 236Monolithic hypervisor 237Console guest hypervisor 237Microkernel-based hypervisor 238Core partitioning architectures 239Leveraging hardware assists for virtualization 241Mode hierarchy 241Intel VT 242Power architecture ISA 2.06 embedded hypervisor extensions 243ARM TrustZone 244ARM Virtualization Extensions 246Hypervisor robustness 246SubVirt 247Blue pill 247Ormandy 247Xen owning trilogy 247VMwares security certification 248227Real World Multicore Embedded Systems.DOI: 2013 Elsevier Inc. All rights reserved.I/O Virtualization 249Peripheral virtualization architectures 249Emulation 249Pass-through 250Mediated pass-through 253Peripheral sharing architectures 253Hardware multiplexing 253Hypervisor software multiplexing 254Bridging 254Dom0/console guest 255Combinations of I/O virtualization approaches 255I/O virtualization within microkernels 255Case study: power architecture virtualization and the freescale P4080 257Power architecture hardware hypervisor features 257Power architecture virtualized memory management 260Freescale P4080 IOMMU 261Hardware support for I/O sharing in the P4080 262Virtual machine configuration in power architecture 263Example use cases for system virtualization 263Telecom blade consolidation 264Electronic flight bag 264Intelligent Munitions System 264Automotive infotainment 265Medical imaging system 266Conclusion 266References 267What is virtualization?With respect to computers, virtualization refers to the abstraction ofcomputing resources. Operating systems offer virtualization services byabstracting the details of hard disks, Ethernet controllers, CPU cores, andgraphics processing units with convenient developer APIs and interface,for example file systems, network sockets, SMP thread scheduling, andOpenGL. An operating system simplifies the life of electronic productdevelopers, freeing them to focus on differentiation.Of course, this environment has grown dramatically up the stack. Insteadof just providing a TCP/IP stack, embedded operating systems areincreasingly called on to provide a full suite of web protocols, remotemanagement functions, and other middleware algorithms and protocols.228 Chapter 7Furthermore, as multicore SoCs become more widespread, applicationand real-time workloads are being consolidated so the operatingsystem must support general-purpose applications while respondinginstantly to real-time events and protecting sensitive communicationsinterfaces from corruption.In essence, the typical operating system abstractions files, devices,network communication, graphics, and threads are proving insufficientfor emerging multicore embedded applications. In particular, developersneed the flexibility of abstracting the operating system itself. Thenetworking example of control- and data-plane workload consolidation,with a Linux-based control plane and a real-time data plane, is but oneuse case. Mixed criticality applications are becoming the norm; ageneral-purpose operating system with its sophisticated middleware andopen source software availability is being combined with safety,availability, security, and/or real-time critical applications that benefitfrom isolation from the general-purpose environment.System virtualization refers specifically to the abstraction of an entirecomputer (machine), including its memory architecture, peripherals,multiple CPU cores, etc. The machine abstraction enables the executionof complete general-purpose guest operating systems. A hypervisor is thelow-level software program that presents and manages the virtualmachines, scheduling them across the available cores. The hypervisorappropriates the role of SoC hardware management traditionallyrelegated to the operating system and pushes the operating system upinto the virtual machine. The guest operating systems are permitted toaccess resources at the behest of the hosting hypervisor.This chapter is dedicated to the discussion of system virtualization inembedded systems, with an emphasis on modern multicore platforms. Tocapture theoretical multicore processor performance gains, the firstobvious approach is to use fine-grained instruction-level or thread-levelparallelism. The approach has been the subject of research andinvestment for years with parallelizing compilers, parallelprogramming languages (and language extensions, such as OpenMP),and common operating system multithreading. These fine-grainedparallelization approaches have achieved limited success in unlockingSystem Virtualization in Multicore Systems 229the full potential of multicore horsepower. However, it has been realizedthat embedded system architectures in the telecommunication,networking, and industrial control and automation segments are alreadynaturally partitioned into data, control, and management planes. Thisseparation has driven strong interest amongst original equipmentmanufacturers (OEMs) to map the entire system to a multicore system-on-chip (SoC) in a manner that simply integrates the previously stand-alone functions onto a single device. System virtualization is the obviousmechanism to realize this mapping.A brief retrospectiveComputer system virtualization was first introduced in mainframesduring the 1960 s and 70 s. Although virtualization remained a largelyuntapped facility during the 80 s and 90 s, computer scientists havelong understood many of the applications of system virtualization,including the ability to run distinct and legacy operating systems on asingle hardware platform.At the start of the millennium, VMware proved the practicality of fullsystem virtualization, hosting unmodified, general-purpose guestoperating systems, such as Windows, on common Intel Architecture(IA)-based hardware platforms.In 2005, Intel launched its Virtualization Technology (Intels VT), whichboth simplified and accelerated virtualization. Consequently, a number ofvirtualization software products have emerged, alternatively called virtualmachine monitors or hypervisors, with varying characteristics and goals.Similar hardware assists for system virtualization have emerged in otherpopular embedded CPU architectures, including ARM and Power.While virtualization may be best known for its application in data centerserver consolidation and provisioning, the technology has proliferatedacross desktop and laptop-class systems, and has most recently found itsway into mobile and embedded environments. This evolution is nodifferent from, and is related to, the migration of multicore processorsinto the embedded and mobile space. System virtualization enablesdevelopers to make better use of multicore processors by enabling thehosting of a wider range of concurrent workloads.230 Chapter 7Similarly, multicore processors enable virtualization to be more practicalto implement from a performance efficiency perspective. Multicore andvirtualization are certainly symbiotic technologies. The availability ofsystem virtualization technology across a wide range of computingplatforms provides developers and technologists with the ultimate openplatform: the ability to run any flavor of operating system in anycombination, creating an unprecedented flexibility for deployment andusage.Yet embedded multicore systems often have drastically differentperformance and reliability constraints as compared to multicore servercomputing. We will also focus on the impact of hypervisor architectureupon these constraints.Applications of system virtualizationMainframe virtualization was driven by some of the same applicationsfound in todays enterprise systems. Initially, virtualization was used fortime-sharing multiple single-user operating systems, similar to theimproved hardware utilization driving modern data center serverconsolidation. Another important usage involved testing and exploringnew operating system architectures. Virtualization was also used tomaintain backwards compatibility of legacy versions of operatingsystems.Environment sandboxingImplicit in the concept of consolidation is the premise that independentvirtual machines are kept reliably separated from each other (alsoreferred to as virtual machine isolation). The ability to guaranteeseparation is highly dependent upon the robustness of the underlyinghypervisor software. As we will soon expand upon, researchers havefound flaws in commercial hypervisors that violate this separationassumption. Nevertheless, an important theoretical application of virtualmachine compartmentalization is to isolate untrusted software. Forexample, a web browser connected to the Internet can be sandboxed in avirtual machine so that Internet-borne malware or browser vulnerabilitiesSystem Virtualization in Multicore Systems 231are unable to infiltrate or otherwise adversely impact the users primaryoperating system environment.Virtual appliancesAnother example, the virtual appliance, does the opposite: sandbox, orseparate, trusted software away from the embedded systems primaryoperating system environment. Consider anti-virus software that runs ona mobile device. A few years ago, the Metal Gear Trojan was able topropagate itself across Symbian operating-system-based mobile phonesby disabling their anti-malware software. Virtualization can solve thisproblem by placing the anti-malware software into a separate virtualmachine (Figure 7.1).The virtual appliance can analyze data going into and out of the primaryapplication environment or hook into the platform operating system fordemand-driven processing.Workload consolidationAs microprocessors get more powerful, developers of sophisticatedembedded systems look to consolidate workloads running on discreteprocessors into a single, more powerful platform. System virtualization isa natural mechanism to support this migration since complete operatingsystems and their applications can be moved wholesale into a virtualmachine with minimal or no changes to code.Figure 7.1Improved robustness using isolated virtual appliances.232 Chapter 7Operating system portabilityOperating system portability is one of the original drivers of systemvirtualization. IBM used virtualization to enable older operating systemsto run on top of newer hardware (which did not support native executionof the older operating system). This remains an important feature today.For example, system designers can use virtualization to deploy a knowngold operating system image, even if the SoC supplier does notofficially support it. In some high criticality markets, such as medical andindustrial control, developers may need to be more conservative aboutthe rate of operating system upgrades than a consumer-oriented market.Mixed-criticality systemsA by-product of workload consolidation is the increased incidence ofmixed criticality computing. Critical functions include hard real-time,safety-critical, and/or security-critical applications that may not besuitable for a general-purpose operating system such as Linux. Systemvirtualization enables mixed criticality by providing sandboxes betweengeneral-purpose environments and highly critical environments (such asthat running on a real-time operating system). Later in this chapter weprovide a case study of mixed criticality automotive electronics.Maximizing multicore processing resourcesMany of these consolidation and mixed-criticality system requirements mapwell to multicore platforms. SMP-capable hypervisors enable multipleSMP-capable guest operating systems to share all available cores, makingit far more likely that those cores will be put to good use. Furthermore,when the overall system workload is light, the multicore-aware hypervisorcan dynamically change the available number of cores available to guestoperating systems, enabling unused cores to be turned off to save power.Improved user experienceSystem virtualization can improve user experience relative to atraditional stovepipe distributing computing approach by breaking downphysical component and wiring barriers. For example, consider modernSystem Virtualization in Multicore Systems 233automotive head-units (the center dash computer that typically includesradio, navigation, and other infotainment functions) and clusters (thedriver information system that includes speedometer, odometer, andother gauges). In a consolidated design that incorporates both head-unitand cluster functionality, designers may find it easier to mix differenttypes of information in a heads-up display (part of the cluster) becausethe head-unit and cluster systems are running on the same platform.Communication between the cluster and infotainment systems isextremely fast (shared memory), and the consolidated architecture willencourage automotive OEMs to take a more central role in a synergisticdesign, instead of relying on separate hardware implementations fromTier 1 companies that is, those that supply the OEMs directly thatare not co-developed.Hypervisor architecturesHypervisors are found in a variety of flavors. Some are open source;others are proprietary. Some use thin hypervisors augmented withspecialized guest operating systems. Others use a monolithic hypervisorthat is fully self-contained. In this section, well compare and contrastthe available technologies.Hypervisors perform two major functions: resource management andresource abstraction. The management side deals with allocatingand controlling access to hardware resources, including memory, CPUtime, and peripherals. The abstraction side corresponds to the virtualmachine concept: manifesting an illusion of an actual hardware platform.Typically, a general-purpose operating system executes within eachvirtual machine instance.Type 2The early PC hypervisors VMware Workstation, Parallels, and User-Mode Linux took advantage of the fact that PC operating systemsmade virtualization relatively easy: the underlying host operating systemprovided both resource management (CPU time scheduling, memoryallocation) as well as a means to simplify resource abstraction by reusingthe host operating systems device drivers and communications software234 Chapter 7and APIs. These early hypervisors are now known as Type 2hypervisors: running on top of a conventional operating systemenvironment. Type 2 architecture is shown in Figure 7.2.Type 1The original IBM mainframe hypervisors implemented both resourcemanagement and abstraction without depending on an underlying hostoperating system. In the PC server world, both VMware ESX and Xenapply this same architecture, so-called Type 1 because they run directlyon the hosts hardware. This architecture is shown in Figure 7.3. Theprimary motivation for Type 1 hypervisors is to avoid the complexityand inefficiency imposed by a conventional host operating system. Forexample, a Type 2 hypervisor running on Linux could be subverted byone of the multitudinous vulnerabilities in the millions of lines of code inthe Linux kernel.However, the traditional Type 1 architecture has its own drawbacks.Type 1 has no ability to provide a non-virtualized process environment.Whereas a Type 2 environment can mix and match native applicationsrunning on the host operating system with guests, a Type 1 hypervisor ispurpose-built to execute guests only. Another issue with legacy Type 1hypervisors is that they must reinvent the device drivers and hardwareFigure 7.2Type-2 hypervisor architecture.System Virtualization in Multicore Systems 235management capabilities that conventional operating systems alreadyprovide, tuned for performance and robustness across applications andyears of field deployment. Some Type 1 hypervisors employ aspecialized guest to provide device I/O services on behalf of otherguests. In this approach, the size of the software base that must betrusted for overall system robustness is no better than a Type 2.ParavirtualizationSystem virtualization can be implemented with full virtualization orparavirtualization, a term first coined in the 2001 Denali project [1].With full virtualization, unmodified guest operating systems aresupported. With paravirtualization, the guest operating system ismodified in order to improve the ability of the underlying hypervisor toachieve its intended function.Paravirtualization is sometimes able to provide improved performance.For example, device drivers in the guest operating system can bemodified to make direct use of the I/O hardware instead of requiring I/Oaccesses to be trapped and emulated by the hypervisor. Paravirtualizationoften includes the addition of specialized system calls into guestoperating system kernels that enable them to request services directlyfrom the hypervisor. These system calls are referred to as hypercalls. Forexample, if the guest kernel running in user mode (because it isexecuting in a virtual machine) wishes to disable interrupts, the standardFigure 7.3Type-1 hypervisor architecture.236 Chapter 7supervisor mode instruction used to disable interrupts will no longerwork. This instruction will fault, and the hypervisor can then attempt todiscern and emulate what the kernel was trying to do. Alternatively, thekernel can make a hypercall that explicitly asks the hypervisor.Paravirtualization may be required on CPU architectures that lackhardware virtualization acceleration features.The key advantage to full virtualization over paravirtualization is theability to use unmodified versions of guest operating systems that have aproven fielded pedigree and do not require the maintenance associatedwith custom modifications. This maintenance saving is especiallyimportant in enterprises that use a variety of operating systems and/orregularly upgrade to new operating system versions and patch releases.Note that this section compares and contrasts so-called Type-1 hypervisorsthat run on bare-metal. Type-2 hypervisors run atop a general-purposeoperating system, such as Windows or Linux, which provide I/O and otherservices on behalf of the hypervisor. Because they can be no more robustthan their underlying general-purpose host operating systems, Type-2hypervisors are not suitable for most embedded systems deployments andhave historically been avoided in such environments. Thus, Type-2technology is omitted from the following discussion.Monolithic hypervisorHypervisor architectures most often employ a monolithic architecture.Similar to monolithic operating systems, the monolithic hypervisor requiresa large body of operating software, including device drivers andmiddleware, to support the execution of one or more guest environments. Inaddition, the monolithic architecture often uses a single instance of thevirtualization component to support multiple guest environments. Thus, asingle flaw in the hypervisor may result in a compromise of the fundamentalguest environment separation intended by virtualization in the first place.Console guest hypervisorAn alternative approach uses a trimmed-down hypervisor that runs in themicroprocessors most privileged mode but employs a special guestSystem Virtualization in Multicore Systems 237operating system partition called the console guest to handle the I/Ocontrol and services for the other guest operating systems (Figure 7.4).Examples of this architecture include Xen, Microsoft Hyper-V, and RedBend VLX. Xen pioneered the console guest approach in the enterprise;within Xen, the console guest is called Domain 0, or Dom0 for short.Thus, the console guest architecture is sometimes referred to as theDom0 architecture. With the console guest approach, a general-purposeoperating system must still be relied upon for system reliability.A typical console guest such as Linux may add far more code to thevirtualization layer than found in a monolithic hypervisor.Microkernel-based hypervisorThe microkernel-based hypervisor, a form of Type-1 architecture, isdesigned specifically to provide robust separation between guestenvironments. Figure 7.5 shows the microkernel-based hypervisorarchitecture. Because the microkernel is a thin, bare-metal layer, themicrokernel-based hypervisor is considered a Type-1 architecture.This architecture adds computer virtualization as a service on top of thetrusted microkernel. In some cases, a separate instance of thevirtualization component is used for each guest environment. Thus, thevirtualization layer need only meet the equivalent (and, typically,relatively low) robustness level of the guest itself. In the microkernelFigure 7.4Console guest or Dom0 hypervisor architecture.238 Chapter 7architecture, only the trusted microkernel runs in the highest privilegemode. Examples of embedded hypervisors using the microkernelapproach include the INTEGRITY Multivisor from Green Hills Softwareand some variants of the open standard L4 microkernel.Core partitioning architecturesAnother important aspect of hypervisor architecture is the mechanism formanaging multiple cores. The simplest approach is static partitioning(Figure 7.6), where only one virtual machine is bound permanently toeach core.Dynamic partitioning (Figure 7.7) is the next most flexibleimplementation, allowing virtual machines to migrate between cores.Migration of virtual machines is similar to process or thread migration ina symmetric multiprocessing (SMP) operating system. The ability tomigrate workloads can be critical to performance when the hypervisormust manage more virtual machines than there are cores. This situationis similar to server-based virtualization (for instance, cloud computing)where there are more virtual servers than cores and the hypervisor mustoptimally schedule the servers based on dynamically changingworkloads. In embedded systems, the ability to schedule virtualmachines provides obvious advantages such as the ability to prioritizereal-time workloads.Figure 7.5Microkernel-based Type-1 hypervisor architecture.System Virtualization in Multicore Systems 239Dynamic partitioning also adds the ability to time-share multiple virtualmachines on a single core. This approach is well suited to the previouslymentioned microkernel hypervisor model because of its ability toschedule and time-share processes as well as virtual machines. Thus,specialized applications can be prioritized over virtual machines andvice versa.Dynamic partitioning can be critical for power efficiency. Consider thesimple example of a dual-core processor with two virtual machineworkloads, each of which requires 50% utilization of one core. Withoutsharing, each core must run a virtual machine and use half of theavailable processing resources. Even with dynamic frequency andvoltage scaling, the static energy of each core must be expended. Withshareable partitioning, the hypervisor can determine that a single coreFigure 7.7Dynamic partitioning.Figure 7.6Static partitioning.240 Chapter 7can handle the load of both virtual machines. The hypervisor will thentime slice the two virtual machines on the first core, and turn the secondcore off, reducing power consumption substantially.Dynamic multicore partitioning adds support for multicore virtualmachines. For example, the hypervisor can allocate a dual-core virtualmachine to two cores, host an SMP guest operating system, and takeadvantage of its concurrent workloads. Single-core virtual machines canbe hosted on other cores (Figure 7.8).A hypervisor may support dynamically switching the number of coresallocated to a virtual machine. This is another method for improvingoverall performance efficiency. With Linux guests, the Linux hot plugfunctionality can be employed by a hypervisor to force the guest intorelinquishing or adding cores, based on some application-specific policy.Leveraging hardware assists for virtualizationThe addition of CPU hardware assists for system virtualization has beenkey to the practical application of hypervisors in embedded systems.Mode hierarchyMany processor architectures define a user and supervisor state. Theoperating system kernel often runs in supervisor mode, giving it accessFigure 7.8Dynamic multicore partitioning.System Virtualization in Multicore Systems 241to privileged instructions and hardware such as the memory managementunit (MMU) while applications run in user mode. The hardwareefficiently implements transitions between applications and the operatingsystem kernel in support of a variety of services such as peripheral I/O,virtual memory management, and application thread control.To support hypervisors, hardware virtualization acceleration adds a thirdmode to run the hypervisor. In a manner analogous to the user-kerneltransitions, the hypervisor mode supports transitions between user,supervisor, and hypervisor state. Another way of thinking about this tri-levelhierarchy is that the hypervisor mode is analogous to the old supervisormode that provided universal access to physical hardware state, and theguest operating systems now run in the guest supervisor mode and the guestapplication in guest user mode. While an additional mode of increasedprivilege is the lowest common denominator for hardware virtualizationassistance, numerous other hardware capabilities can dramatically improvethe efficiency of system virtualization and may vary dramatically acrossCPU architectures and implementations within an architecture.Intel VTIntel VT, first released in 2005, has been a key factor in the growingadoption of full virtualization throughout the enterprise-computingworld. Virtualization Technology for x86 (VT-x) provides a number ofhypervisor assistance capabilities, including a true hardware hypervisormode which enables unmodified guest operating systems to execute withreduced privilege. For example, VT-x will prevent a guest operatingsystem from referencing physical memory beyond what has beenallocated to the guests virtual machine. In addition, VT-x enablesselective exception injection, so that hypervisor-defined classes ofexceptions can be handled directly by the guest operating system withoutincurring the overhead of hypervisor software interposing. While VTtechnology became popular in the server-class Intel chipsets, the sameVT-x technology is now also available in Intel Atomt embedded andmobile processors.A second generation of Intel VT adds page table virtualization. WithinIntel chipsets, this feature is called Extended Page Tables (EPT). Page242 Chapter 7table virtualization enables the guest operating system to create andmanage its own virtual memory mappings. A guest operating systemvirtual address is mapped to a guest physical address. The guest seesonly this mapping. However, guest physical addresses are mapped againto real physical addresses, configured by the hypervisor butautomatically resolved by hardware.Power architecture ISA 2.06 embedded hypervisor extensionsIn 2009, the Power Architecture governance body,, addedvirtualization to the embedded specification within the Power Architectureversion 2.06 instruction set architecture (ISA). At the time of writing,Freescale Semiconductor is the only embedded microprocessor vendor tohave released products, including the QorIQ P4080 and P5020 multicorenetwork processors, supporting this embedded virtualization specification.A detailed case study of Power Architecture virtualization and the P4080can be found later in this chapter.In 2010, ARM Ltd. announced the addition of hardware virtualizationextensions to the ARM architecture as well as the first ARM core, theCortex A15, to implement them. Publicly announced licensees planningto create embedded SoCs based on Cortex A15 include TexasInstruments, Nvidia, Samsung, and ST Ericsson.Prior to the advent of these hardware virtualization extensions, fullvirtualization was only possible using dynamic binary translation andinstruction rewriting techniques that were exceedingly complex andunable to perform close enough to native speed to be practical inembedded systems. For example, a 2005-era x86-based desktop runningGreen Hills Softwares pre-VT virtualization technology was able tosupport no more than two simultaneous full-motion audio/video clips(each in a separate virtual machine) without dropping frames. With VT-xon similar-class desktops, only the total RAM available to host multiplevirtual machines generally limits the number of simultaneous clips.General x86 virtualization benchmarks showed an approximate doublingof performance using VT-x relative to the pre-VT platforms. In addition,the virtualization software layer was simplified due to the VT-xcapabilities.System Virtualization in Multicore Systems 243ARM TrustZoneAn often overlooked and undervalued virtualization capability in modernARM microprocessors is ARM TrustZone. TrustZone enables aspecialized, hardware-based form of system virtualization. TrustZoneprovides two zones: a normal zone and a trusted or secure zone.With TrustZone, the multimedia operating system (for example, what theuser typically sees on a smartphone) runs in the normal zone, whilesecurity-critical software runs in the secure zone. While secure zonesupervisor mode software is able to access the normal zones memory, thereverse is not possible (Figure 7.9). Thus, the normal zone acts as avirtual machine under control of a hypervisor running in the trust zone.However, unlike other hardware virtualization technologies such as IntelVT, the normal-zone guest operating system incurs no execution overheadrelative to running without TrustZone. Thus, TrustZone removes theperformance (and arguably the largest) barrier to the adoption of systemvirtualization in resource-constrained embedded devices.TrustZone is a capability inherent in modern ARM applications processorcores, including the ARM1176, Cortex A5, Cortex A8, Cortex A9, andCortex A15. However, it is important to note that not all SoCs usingthese cores fully enable TrustZone. The chip provider must permit securezone partitioning of memory and I/O peripheral interrupts throughout theSoC complex. Furthermore, the chip provider must open the secure zonefor third-party trusted operating systems and applications. Examples ofTrustZone-enabled mobile SoCs are the Freescale i.MX53 (Cortex A8)and the Texas Instruments OMAP 4430 (Cortex A9).Trusted software might include cryptographic algorithms, networksecurity protocols (such as SSL/TLS) and keying material, digital rightsFigure 7.9ARM TrustZone.244 Chapter 7management (DRM) software, virtual keypad for trusted path credentialinput, mobile payment subsystems, electronic identity data, and anythingelse that a service provider, mobile device manufacturer, and/or mobileSoC supplier deems worthy of protecting from the user environment.In addition to improving security, TrustZone can reduce the cost andtime to market for mobile devices that require certification for use inbanking and other critical industries. With TrustZone, the bank (orcertification authority) can limit certification scope to the secure zoneand avoid the complexity (if not infeasibility) of certifying themultimedia operating system environment.A secure zone operating system can further reduce the cost andcertification time for two main reasons. First, because the certifiedoperating system is already trusted, with its design and testing artifactsavailable to the certification authority, the cost and time of certifying thesecure zone operating environment is avoided.Secondly, because the secure zone is a complete logical ARM core, thesecure operating system is able to use its MMU partitioning capabilitiesto further divide the secure zone into meta-zones (Figure 7.10). Forexample, a bank may require certification of the cryptographic meta-zone used to authenticate and encrypt banking transaction messages, butthe bank will not care about certifying a multimedia DRM meta-zonethat, while critical for the overall device, is not used in bankingtransactions and is guaranteed by the secure operating system notto interfere.TrustZone-enabled SoCs are able to partition peripherals and interruptsbetween the secure and normal states. A normal-zone general-purposeFigure 7.10TrustZone virtualization implementation with metazones within the trust zone.System Virtualization in Multicore Systems 245operating system such as Android can not access peripherals allocated tothe secure zone and will never see the hardware interrupts associatedwith those peripherals. In addition, any peripherals allocated to thenormal zone are unable to access memory in the normal zone.ARM Virtualization ExtensionsWith the Cortex A15, ARM Ltd. has added a general hardwarehypervisor acceleration solution to ARM cores. This technology isknown as the ARM Virtualization Extensions, or ARM VE. Similar tothe virtualization extensions for other CPU architectures, ARM VEintroduces a hypervisor mode, enabling guest operating systems to run ina de-privileged supervisor mode. ARM VE is orthogonal to TrustZone:the hypervisor mode only applies to the normal state of the processor,leaving the secure state to its traditional two-level supervisor/user modehierarchy. Thus, the hypervisor executes de-privileged relative to thetrust zone security kernel. ARM VE also supports the concept of a singleentity that can control both the trust zone and the normal-zonehypervisor mode. One example of a commercial multicore ARM SoCthat supports ARM VE is the Texas Instruments OMAP5 family (basedon the Cortex A15).ARM VE provides page table virtualization, similar to Intels EPT.Hypervisor robustnessSome tout virtualization as a technique in a layered defense for systemsecurity. The theory postulates that, since only the guest operatingsystem is exposed to external threats, an attacker who penetratesthe guest will be unable to subvert the rest of the system. In essence, thevirtualization software is providing an isolation function similar tothe process model provided by most modern operating systems.As multicore platforms promote consolidation, the robustness of thevirtualization layer is critical in achieving overall system goals forperformance, security, and availability.However, common enterprise virtualization products have not met highrobustness requirements and were never designed or intended to meet246 Chapter 7these levels. Thus, it should come as no surprise that the theory ofsecurity via virtualization has no existence proof. Rather, a number ofstudies of virtualization security and successful subversions ofhypervisors have been published. While they look at robustness from asecurity perspective, the following studies are equally applicable to anymulticore embedded computing system with critical requirements (real-time, safety, security, availability).SubVirtIn 2006, Samuel Kings SubVirt project demonstrated hypervisor rootkitsthat subverted both VMware and Microsoft VirtualPC [2].Blue pillThe Blue Pill project took hypervisor exploits a step further bydemonstrating a malware payload that was itself a hypervisor that couldbe installed on-the-fly beneath a natively running Windows operatingsystem. Since this work, a hypervisor that runs beneath and unbeknownstto another hypervisor is referred to generically as a blue pill hypervisor.Platform attestation is required in order to prevent hypervisors frombeing subverted in this manner. Platform attestation is the process ofvalidating that only known good firmware (e.g., the hypervisor) is bootedand controlling the computing platform at all times.OrmandyTavis Ormandy performed an empirical study of hypervisorvulnerabilities. Ormandys team of researchers generated random I/Oactivity into the hypervisor, attempting to trigger crashes or otheranomalous behavior. The project discovered vulnerabilities in QEMU,VMware Workstation and Server, Bochs, and a pair of unnamedproprietary hypervisor products [3].Xen owning trilogyAt the 2008 Black Hat conference, security researcher Joanna Rutkowskaand her team presented their findings of a brief research project to locateSystem Virtualization in Multicore Systems 247vulnerabilities in Xen. One hypothesis was that Xen would be less likelyto have serious vulnerabilities, as compared to VMware and MicrosoftHyper-V due to the fact that Xen is an open-source technology andtherefore benefits from the many-eyes exposure of the code base.Rutkowkas team discovered three different, fully exploitablevulnerabilities that the researchers used to commandeer the computervia the hypervisor [4]. Ironically, one of these attacks took advantageof a buffer overflow defect in Xens Flask layer. Flask is a securityframework, the same one used in SELinux, that was added to Xen toimprove security. This further underscores an important principle:software that has not been designed for and evaluated to high levels ofassurance must be assumed to be subvertible by determined and well-resourced entities.VMwares security certificationAs VMware virtualization deployments in the data center have grown,security experts have voiced concerns about the implications of VMsprawl and the ability of virtualization technologies to ensure security.On June 2, 2008, VMware attempted to allay this concern with itsannouncement that its hypervisor products had achieved a CommonCriteria EAL 41 security certification. VMwares press release claimedthat its virtualization products could now be used for sensitive,government environments that demand the strictest security.On June 5, just three days later, severe vulnerabilities in the certifiedVMware hypervisors were posted to the National Vulnerability Database.Among other pitfalls, the vulnerabilities allow guest operating systemusers to execute arbitrary code [5].VMwares virtualization products have continued to amass severevulnerabilities, for example CVE-2009-3732, which enables remoteattackers to execute arbitrary code.Clearly, the risk of an escape from the virtual machine layer, exposingall guests, is very real. This is particularly true of hypervisorscharacterized by monolithic code bases. It is important for developers tounderstand that use of a hypervisor does not imply highly robust248 Chapter 7isolation between virtual machines, no more than the use of an operatingsystem with memory protection implies assured process isolation andoverall system integrity.I/O VirtualizationOne of the biggest impacts on robustness and efficiency in systemvirtualization is the approach to managing I/O across virtual machines.This section discusses some of the pitfalls and emerging trends inembedded I/O virtualization. An I/O virtualization architecture can bebroken down into two dimensions: a peripheral virtualization architectureand a peripheral sharing architecture. First, we discuss the peripheralhardware virtualization approaches. While this discussion is germane tosingle-core platforms, the architectural design of I/O virtualization formulticore systems is even more critical due to the frequent concurrentuse of shared I/O hardware across CPU cores.Peripheral virtualization architecturesEmulationThe traditional method of I/O virtualization is emulation: all guestoperating system accesses to device I/O resources are intercepted,validated, and translated into hypervisor-initiated operations(Figure 7.11). This method maximizes reliability. The guest operatingsystem can never corrupt the system through the I/O device because allI/O accesses are protected via the trusted hypervisor device driver.A single device can easily be multiplexed across multiple virtualmachines, and if one virtual machine fails, the other virtual machines cancontinue to utilize the same physical I/O device, maximizing systemavailability. The biggest drawback is efficiency; the emulation layercauses significant overhead on all I/O operations.In addition, the hypervisor vendor must develop and maintain the devicedriver independent of the guest operating system drivers. When usingemulation, the hypervisor vendor has the option to present virtualhardware that differs from the native physical hardware. As long as theguest operating system has a device driver for the virtual device, theSystem Virtualization in Multicore Systems 249virtual interface may yield improved performance. For example, thevirtual device may use larger buffers with less inter-devicecommunication, improving overall throughput and efficiency.The hypervisor vendor also has the option to create a virtual device thathas improved efficiency properties and does not actually exist in anyphysical implementation. While this paravirtualization approach canyield improved performance, any gains are offset by the requirement tomaintain a custom device driver for the guest.Pass-throughIn contrast, a pass-through model (Figure 7.12) gives a guest operatingsystem direct access to a physical I/O device. Depending on the CPU,the guest driver can either be used without modification or with minimalparavirtualization. The pass-through model provides improved efficiencybut trades off robustness: an improper access by the guest can take downany other guest, application, or the entire system. This model violates aprimary goal of system virtualization: isolation of virtual environmentsfor safe coexistence of multiple operating system instances on a singlemulticore computer.Figure 7.11I/O virtualization using emulation.250 Chapter 7If present, an IOMMU enables a pass-through I/O virtualization modelwithout risking direct memory accesses beyond the virtual machinesallocated memory. As the MMU enables the hypervisor to constrainmemory accesses of virtual machines, the IOMMU constrains I/O memoryaccesses (especially DMA), whether they originate from software runningin virtual machines or the external peripherals themselves (Figure 7.13).IOMMUs are becoming increasingly common in embedded multicoremicroprocessors, such as Intel Core, Freescale QorIQ, and ARM CortexA15. Within Intel processors, the IOMMU is referred to as IntelVirtualization Technology for Directed I/O (Intel VT-d). On FreescalesFigure 7.12I/O virtualization using pass-through.Figure 7.13IOMMU is used to sandbox device-related memory accesses.System Virtualization in Multicore Systems 251virtualization-enabled QorIQ processors such as the eight-core P4080,the IOMMU is referred to as the Peripheral Access Management Unit(PAMU). On Cortex A15 (and other ARM cores that support the ARMVirtualization Extensions), the IOMMU is not part of the basevirtualization specification. Rather, ARM Ltd. has a separate intellectualproperty offering, called a SystemMMU, which is optionally licensableby ARM semiconductor manufacturers. In addition, the manufacturermay use a custom IOMMU implementation instead of the ARMSystemMMU. In addition, ARM TrustZone provides a form of IOMMUbetween the normal and secure zones of an ARM processor: normal-zoneaccesses made by the CPU or by peripherals allocated to the normalzone are protected against accessing memory in the secure zone.The IOMMU model enables excellent performance efficiency withincreased robustness relative to a pass-through model without IOMMU.However, IOMMUs are a relatively new concept.A number of vulnerabilities (ways to circumvent protections) have beendiscovered in IOMMUs and must be worked around carefully with theassistance of your systems software/hypervisor supplier. In mostvulnerability instances, a faulty or malicious guest is able to circumventthe hypervisor isolation via device, bus, or chipset-level operations otherthan direct memory access. Researchers at Green Hills Software, forexample, have discovered ways for a guest operating system to accessmemory beyond its virtual machine, deny execution service to othervirtual machines, blue pill the system hypervisor, and take down theentire computer all via IOMMU-protected I/O devices. For high-reliability applications, the IOMMU must be applied in a different waythan the traditional pass-through approach where the guest operatingsystem has unfettered access to the I/O device.A major downside of a pass-through approach (with or without theIOMMU) is that it prevents robust sharing of a single I/O device acrossmultiple virtual machines. The virtual machine that is assignedownership of the pass-through device has exclusive access, and any othervirtual machines must depend upon the owning virtual machine toforward I/O. If the owning virtual machine is compromised, all virtualmachines shall be denied servicing for that device.252 Chapter 7Mediated pass-throughMediated pass-through is a special case of the pass-through approachdescribed above. Similar to pure pass-through, the guest operatingsystem device drivers are reused (unmodified). However, the hypervisortraps accesses that are system security/reliability relevant (for exampleprogramming of DMA registers). The hypervisor can validate the accessand pass through, modify, or reject the access based on policy (e.g., isthe DMA accessing a valid virtual machine memory location?). Theadditional performance overhead of mediated pass-through relative topure pass-through depends very much upon the specific hardware deviceand its I/O memory layout. The trade-off in reliability and performanceis often acceptable. The hypervisor usually contains only a small amountof device-specific logic used to perform the mediation; a completehypervisor device driver is not required.Peripheral sharing architecturesIn any embedded system, invariably there is a need for sharing a limitedset of physical I/O peripherals across workloads. The embeddedoperating system provides abstractions, such as layer two, three, and foursockets, for this purpose. Sockets provide, in essence, a virtual interfacefor each process requiring use of a shared network interface device.Similarly, in a virtualized system, the hypervisor must take on the role ofproviding a secure virtual interface for accessing a shared physical I/Odevice. Arguably the most difficult challenge in I/O virtualization is thetask of allocating, protecting, sharing, and ensuring the efficiency of I/Oacross multiple virtual machines and applications.Hardware multiplexingThis deficiency has led to emerging technologies that provide an abilityto share a single I/O device across multiple guest operating systemsusing the IOMMU and hardware partitioning mechanisms built into thedevice I/O complex (e.g., chipset plus the peripheral itself). One exampleof shareable, IOMMU-enabled, pass-through devices is Intel processorsequipped with Intel Virtualization Technology for Connectivity (IntelVT-c) coupled with PCI-express Ethernet cards implementing Single-Root I/O Virtualization (SR-IOV), a PCI-SIG standard. With such aSystem Virtualization in Multicore Systems 253system, the hardware takes care of providing independent I/O resources,such as multiple packet buffer rings, and some form of quality ofexecution service amongst the virtual machines. This mechanism lendsitself well to networking devices such as Ethernet, RapidIO, andFibreChannel; however, other approaches are required for robust,independent sharing of peripherals such as graphics cards, keyboards,and serial ports. Nevertheless, it is likely that hardware-enabled,IOMMU-protected, shareable network device technology shall grow inpopularity across embedded processors.Hypervisor software multiplexingWhen using an emulated peripheral virtualization approach, it is oftennatural to also employ the hypervisor to provide sharing of the peripheralacross multiple virtual machines. Guest I/O accesses are interposed bythe hypervisor and then funneled to/from the physical device based onpolicy. Hypervisor multiplexing enables the enforcement of quality ofservice (QoS) and availability that is not possible when a pass-through oreven a mediated pass-through approach is used. In the mediated/unmediated pass-through approaches, one guest must be trusted foravailability and quality of service. If that guest fails, then all use of theshared peripheral is lost. Hypervisor multiplexing incurs minimalperformance overhead associated with analyzing and moving dataappropriately between virtual machines.BridgingRegardless of which peripheral virtualization approach is employed,bridging can be used to provide one guest with access to I/O by using aperipheral operated by another guest. For example, a master guest thathas access to a virtual Ethernet device (virtualized using any of theaforementioned methods) can offer an Ethernet or IP bridge to otherslave guests. Similar techniques can be used for other stream/block-oriented I/O. Bridging is commonly used when pass-through or mediatedpass-through is employed for such devices. Many modern operatingsystems, including Linux, provide convenient facilities for establishingvirtual interface adapters that the hypervisor can use to make sharingseamless in the guest device driver model. The hypervisor must stillprovide the underlying inter-VM communication facilities. In some254 Chapter 7cases, this communication will include additional intelligence such asnetwork address translation (NAT) routing for an IP bridge. Bridging canprovide a simpler virtualization solution than a hypervisor-multiplexedsolution, with the trade-off that a guest must be trusted for systemavailability and quality of service.Dom0/console guestIntroduced earlier in this chapter, the console or Dom0 guest virtualizationarchitecture employs a special case of bridging in which the master is aguest dedicated for the purpose of providing I/O services to the systemsother virtual machines. A master I/O guest approach can provideadditional robustness protection relative to bridging since the master isshielded by the hypervisor from the software environments of other guests.Combinations of I/O virtualization approachesVarious combinations of peripheral virtualization and sharing architectureshave advantages and disadvantages. While many of these trade-offs havebeen discussed in the individual sections above, it is also important toremember the overall system design may require employment of multiplearchitectures and approaches. For example, while paravirtualization andhypervisor multiplexing may be required when a high-performance CPUis shared between virtual machines, a pass-through or mediated pass-through approach may provide the best performance, cost, and reliabilitytrade-offs for some devices in some systems. Furthermore, the impact onoverall system performance of an I/O virtualization approach varies basedon many factors, including the applications use of the peripheral, theCPU (speed, bus speed, hardware virtualization capability, etc.), the typeof device (block/streamed, low-latency, etc.), and more. The subject of I/Ovirtualization is indeed complex, and striking the proper balance forperformance, reliability, and maintainability may require advice from yoursystem virtualization supplier.I/O virtualization within microkernelsAs discussed earlier, virtual device drivers are commonly employed bymicrokernel-style operating systems. Microkernel-based hypervisorsSystem Virtualization in Multicore Systems 255are also well suited to secure I/O virtualization: instead of the typicalmonolithic approach of placing device drivers into the hypervisor itselfor into a special-purpose Linux guest operating system (the dom0method described earlier), the microkernel-based hypervisor usessmall, reduced-privilege, native processes for device drivers, I/Omultiplexors, health managers, power managers, and other supervisoryfunctions required in a virtualized environment. Each of theseapplications is provided only the minimum resources required toachieve its intended function, fostering robust embedded systemdesigns. Figure 7.14 shows the system-level architecture of amicrokernel-based hypervisor used in a multicore networkingapplication that must securely manage Linux control-planefunctionality alongside high-throughput, low-latency data-plane packetprocessing within virtual device drivers.Without virtualization, the above application could be implemented witha dual Linux/RTOS configuration in which the control- and data-planeoperating systems are statically bound to a set of independent cores. Thisis referred to as an asymmetric multiprocessing (AMP) approach. Oneadvantage of virtualization over an AMP division of labor is theflexibility of changing the allocation of control- and data-planeworkloads to cores. For example, in a normal mode of operation, thearchitect may only want to use a single core for control and all othercores for data processing. However, the system can be placed intomanagement mode in which Linux needs four cores (SMP) while thedata processing is temporarily limited. The virtualization layer canhandle the reallocation of cores seamlessly under the hood, somethingthat a static AMP system cannot support.Figure 7.14Virtual device drivers in microkernel-based system virtualization architecture.256 Chapter 7Case study: power architecture virtualization and thefreescale P4080Power Architecture has included hardware virtualization support since2001 in its server-based instruction set architecture (ISA). However, withthe advent of Power Architecture ISA 2.06, hardware virtualization isnow found in general-purpose embedded processors. The eight-coreFreescalet P4080 is an example of a commercial embedded processorwith Power Architecture 2.06 embedded hardware virtualizationcompatibility. The P4080 was the first microprocessor to providehardware virtualization capability in a multicore configuration targetedfor embedded applications.Hypervisor performance is highly dependent upon hardware support forvirtualization. With the advent of Power Architecture 2.06s ISA BookIII-E [6] virtualization hardware support is now available in embeddedPower Architecture-based processors such as the Freescale P4080 [7].Power architecture hardware hypervisor featuresPower Architecture ISA 2.06 embedded virtualization capabilitiesinclude a new hypervisor mode, guest interrupt injection, guesttranslation look-aside buffer (TLB) management, and inter-partitioncommunication mechanisms.The embedded hypervisor mode converts the traditional two-level user/supervisor mode architecture to a three-level guest-user/guest-supervisor/hypervisor mode hierarchy described earlier. The hypervisor levelenables hypervisor-mediated access to hardware resources required byguest operating systems. When a guest performs a privileged operation,such as a TLB modification, the processor can be configured to trap thisoperation to the hypervisor. The hypervisor can then decide whether toallow, disallow, or modify (emulate) the guest operation. The hypervisormode also provides for an additional guest processor state, enablingguest operating systems to access partition-specific logical registerswithout trapping to the hypervisor.Guest interrupt injection enables the hypervisor to control whichinterrupts and exceptions cause a trap into hypervisor mode forSystem Virtualization in Multicore Systems 257hypervisor handling, and which interrupts can be safely redirected to aguest, bypassing the hypervisor. For example, an I/O device that isshared between multiple partitions may require hypervisor handling,while a peripheral dedicated to a specific partition can be handleddirectly by the partitions guest operating system. Interrupt injection iscritical for achieving real-time response when relying on a guest RTOSfor real-time workloads.TLB and page table management is one of the most performance-criticalaspects of hardware virtualization support. The TLB and pagetables govern access to physical memory and therefore must be managedby some combination of virtualization and software to enforcepartitioning. Without any special hardware support, attempts tomanipulate hardware TLB or page tables via supervisor modeinstructions are trapped and emulated by the hypervisor. Depending onthe dynamic nature of a guest operating system, this mediation can causeperformance problems.This mediation is avoided in microprocessors that support a two-levelvirtualization-enabled page table system in hardware. In such a system,the guest operating system is able to manage its TLB and pagetables directly, without hypervisor assistance. The hypervisor only needsto configure the system to ensure that each guests page tables referencememory allocated to the guests virtual machine.ARM virtualization extensions (ARM VE) and Intel chipsets that supportthe second-generation VT-x with Extended Page Tables (EPT) areexamples of hardware virtualization implementations that avoid thebottleneck of hypervisor page table management. Power 2.06 does notcurrently specify such a two-level guest-managed page table architecture.Thus, the approach taken by the hypervisor in managing guest pagetables may be critical for overall system performance; the approaches aredescribed later, in section 8.2.Power Architecture ISA 2.06 supports doorbell messages and interruptsthat enable guest operating systems to communicate with softwareoutside its virtual machine, without involving the hypervisor. Forexample, a control-plane guest operating system can efficiently informdata-plane applications of changes that might affect how packets are258 Chapter 7processed. Besides being a general shoulder-tap mechanism for thesystem, the hypervisor can use doorbells to reflect asynchronousinterrupts to a guest. Asynchronous interrupt management is complicatedby the fact that an interrupt cannot be reflected until the guest is ready toaccept it. When an asynchronous interrupt occurs, the hypervisor mayuse the Power Architecture msgsnd instruction to send a doorbellinterrupt to the appropriate guest. Doorbell exceptions remain pendinguntil the guest has set the appropriate doorbell enable bit. When theoriginal exception occurs, it is first directed to the hypervisor, which cannow safely reflect the interrupt using the asynchronous doorbell.Power Architecture also provides additional facilities for managinginterrupts between the hypervisor and guests. For example, if thehypervisor is time-slicing multiple guest operating systems, a schedulingtimer may be used directly by the hypervisor. If a guest operating systemis given exclusive access to a peripheral, then that peripherals interruptscan be directed to the guest, avoiding the latency of trapping first to thehypervisor before reflecting the interrupt back up to a guest.Power Architecture defines a new set of guest registers that mirror priornon-guest versions to enable intelligent interrupt management. Theseregisters include: Guest Save/Restore registers (GSRR0/1) Guest Interrupt Vector Prefix Register (GIVPR) Guest Interrupt Vector Registers (GIVORn) Guest Data Exception Address Register (GDEAR) Guest Exception Syndrome Register (GESR) Guest Special Purpose Registers (GSPRG0..3) Guest External Proxy (GEPR) Guest Processor ID Register (GPIR)Most of these registers provide state information important to servicingexceptions and interrupts and duplicate those available in hypervisormode (or prior Power Architectures implementations without guestmode). These registers have different offsets from the originals. Toensure legacy operating systems can run unchanged while in guest mode,the processor maps references to the original non-guest versions to theguest versions.System Virtualization in Multicore Systems 259When an external interrupt is directed to a guest and the interrupt occurswhile executing in hypervisor mode, the interrupt remains pending untilthe processor returns to guest mode. The hardware automatically handlesthis transition.Power Architecture introduces some additional exceptions and interruptsto support hypervisors. To allow a paravirtualized guest operating system tocall the hypervisor, a special system call causes the transition between guestand hypervisor mode. It uses the existing Power Architecture system callinstruction but with a different operand.Some exceptions and interrupts are always handled by the hypervisor.When a guest tries to access a hypervisor privileged resource such asTLB, the exception always triggers execution to hypervisor mode andallows the hypervisor to check protections and virtualize the access whennecessary. In the case of cache access, normal cache load and storereferences proceed normally. A mode exists for guests to perform somedirect cache management operations, such as cache locking.Power architecture virtualized memory managementOne important role of hypervisors in virtualization is managing andprotecting virtual machines. The hypervisor must ensure that a virtualmachine has access only to its allocated resources and must blockunauthorized accesses. To perform this service, the hypervisor directlymanages the MMU of all cores and, in many implementations, alsomanages an IOMMU for all I/O devices that master memory transactionswithin the SoC.In the case of the MMU, the hypervisor typically adopts one of twoprimary management strategies. The simplest approach from thehypervisors perspective is to reflect all memory management interruptsto the guest. The guest will execute its usual TLB-miss handling code,which includes instructions to load the TLB with a new page table entry.These privileged TLB manipulation instructions will, in turn, trap to thehypervisor so that it can ensure the guest loads only valid pages.Alternatively, the hypervisor can capture all TLB manipulations by theguest (or take equivalent paravirtualized hypercalls from the guest) to260 Chapter 7emulate them. The hypervisor builds a virtual page table for each virtualmachine. When a TLB miss occurs, the hypervisor searches the virtualpage table for the relevant entry. If one is found, it performs a TLB filldirectly and returns from the TLB miss exception without generating anexception to the guest. This process improves performance by precludingthe need for the hypervisor to emulate the TLB manipulation instructionsthat the guest would execute when handling the TLB miss. These TLBmanagement alternatives are a great example of how hypervisorimplementation choices can cause dramatic differences in performanceand why it is important to consult your hypervisor vendor to understandthe trade-offs.To allow the physical TLB to simultaneously carry entries for more thanone virtual machine, the TLBs virtual addresses are extended with apartition identifier. Furthermore, to allow the hypervisor itself to haveentries in the TLB, a further bit of address extension is used forhypervisor differentiation.In Power Architecture, the TLBs virtual address is extended by an LPIDvalue that is a unique identifier for a logical partition (virtual machine).The address is further extended with the GS bit that designates whether anentry is valid for guest or hypervisor mode. These extensions allowthe hypervisor to remap guest logical physical addresses to actual physicaladdresses. Full virtualization is then practical because guest operatingsystems often assume that their physical address space starts at 0.Freescale P4080 IOMMUAs described earlier, an IOMMU mediates all address-based transactionsoriginating from mastering devices. Although IOMMUs are notstandardized in Power Architecture as of ISA version 2.06, as previouslymentioned, the P4080 incorporates an IOMMU (named the PAMU,Peripheral Access Management Unit, by Freescale). DMA-capableperipherals such as Ethernet or external storage devices can be safelydedicated for use by a virtual machine without violating the systempartitioning enforced by the hypervisor. Any attempt to DMA intophysical memory not allocated to the virtual machine will fault to thehypervisor. The IOMMU keeps internal state that relates the originatorSystem Virtualization in Multicore Systems 261of the transaction with authorized address regions and associated actions.The hypervisor initializes and manages the IOMMU to define andprotect logical partitions in the system.Hardware support for I/O sharing in the P4080How hardware resources on a multicore SoC are allocated and possiblyshared across virtual machines is one of the most important performanceattributes of the system. Resources dedicated to a single virtual machinecan often execute at full speed, without hypervisor inter-positioning.However, in many systems, at least some of the limited hardwareresources must be shared. For example, a single Ethernet device mayneed to be multiplexed between virtual machines by the hypervisor. Thehypervisor can handle the Ethernet interrupts, transmit packets on behalfof the virtual machines, and forward received packets to the appropriatevirtual machine. In theory, the hypervisor could even implement firewallor other network services between the virtual machines and the physicalnetwork.Certain applications require shared hardware access but cannot toleratethe overhead of hypervisor mediation. Modern microprocessors are risingto this challenge by providing hardware-mediation capabilities designedspecifically for virtualized environments. For example, the P4080implements encryption and pattern matching engines that support avariety of networking functions. The ability of multiple partitions toshare these resources is an important feature of the P4080.The P4080 provides multiple dedicated hardware portals used tomanage buffers and submit and retrieve work via queues. By dedicatinga portal to a virtual machine, the guest interacts with the encryptionand pattern-matching engines as if the guest completely owned controlof the devices. In the P4080, the Ethernet interfaces participate in thissame hardware queue and buffer management facility. Inbound packetscan be classified and queued to a portal owned by a virtual machinewithout involving hypervisor software during reception. Similarly,each virtual machine can submit packets for transmission using itsprivate portal.262 Chapter 7Virtual machine configuration in power architectureThe Embedded Power Architecture Platform Requirements(ePAPR) specification [8] standardizes a number of hypervisormanagement facilities, including a device tree data structure thatrepresents hardware resources. Resources include available memoryranges, interrupt controllers, processing cores, and peripherals. Thedevice tree may represent the virtual hardware resources that a virtualmachine will see or the physical resources that a hypervisor sees. Inaddition, ePAPR defines a boot image format that includes the devicetree. ePAPR enables Power Architecture bootloaders to understand andboot hypervisors from multiple vendors and provides a standardizedapproach for hypervisors to instantiate virtual machines.Power Architectures device-tree approach addresses what many believeto be the most significant barrier to adoption of system virtualizationtechnology in multicore embedded systems. Without this form ofconfiguration standardization, hypervisor vendors create proprietaryinterfaces and mechanisms for managing virtual hardware. That iscertainly the case on Intel and ARM architecture-based systems at thistime of writing. The lack of standardized hypervisor configuration andmanagement promotes vendor lock-in, something that ultimately hurtsrather than helps those same vendors.The device-tree configuration approach is gaining increased adoption inthe Linux community and is at least an optional capability supported formost Linux CPU architectures. The next step is to promulgate thisapproach to virtualized Linux operating systems and then to other types ofoperating systems. At the time of writing, support for standardizing virtualmachine configuration is being considered in Linaro (which is quicklyadopting device trees) as well as in the GENIVI automotive alliance.Example use cases for system virtualizationThe use of virtualization in multicore embedded systems is nascent, yetincredibly promising. This section covers just a few of the emergingapplications.System Virtualization in Multicore Systems 263Telecom blade consolidationVirtualization enables multiple embedded operating systems, such asLinuxs and VxWorkss, to execute on a single multicore processor, suchas the Freescale P4080. In addition, the microkernel-based virtualizationarchitecture enables hard real-time applications to execute natively.Thus, control-plane and data-plane applications, typically requiringmultiple blades, can be consolidated. Telecom consolidation provides thesame sorts of size, weight, power, and cost efficiencies that enterpriseservers have enjoyed with VMware.Electronic flight bagElectronic flight bag (EFB) is a general-purpose computing platform thatflight crews use to perform flight management tasks, includingcalculating take-off parameters and viewing navigational changes, moreeasily and efficiently. EFBs replace the stereotypical paper-based flightbags carried by pilots. There are three classes of EFBs, with Class 3being a device that interacts directly with the onboard avionics andrequires air-worthiness certification.Using a safety-rated hypervisor, a Class 3 EFB can provide a typicalLinux desktop environment (e.g., OpenOffice.orgt) for pilots whilehosting safety-critical applications that validate parameters before theyare input into the avionics. Virtualization enables Class 3 EFBs to bedeployed in the portable form factor that is critical for a crampedcockpit.Intelligent Munitions SystemIntelligent Munitions System (IMS) is a next-generation US military net-centric weapon system. One component of IMS includes the ability todynamically alter the state of munitions (for example, mines) to meet therequirements of an evolving battlescape. Using a safety-rated hypervisor,the safety-critical function of programming the munitions and providinga trusted display of weapons state for the soldier is handled by secureapplications running on the safety-certified hypervisor or trusted guest264 Chapter 7operating system. A standard Linux graphical interface is enabled withvirtualization.Automotive infotainmentDemand for more advanced infotainment systems is growing rapidly.In addition to theater-quality audio and video and GPS navigation,wireless networking, and other office technologies are making their wayinto the car. Despite this increasing complexity, passenger expectationsfor instant on and high availability remain. At the same time, automobilesystems designers must always struggle to keep cost, weight, power, andcomponent size to a minimum.Automobile passengers expect the radio and other traditional head-unitcomponents to never fail. In fact, a failure in one of these componentsis liable to cause an expensive (for the automobile manufacturer) visitto the repair shop. Even worse, a severe design flaw in one of thesesystems may result in a recall that wipes out the profit on an entirevintage of cars. Exacerbating the reliability problem is a newgeneration of security threats: bringing Internet into the car exposes itto all the viruses and worms that target general-purpose operatingsystems.One currently deployed solution, found on select high-end automobiles,is to divide the infotainment system onto two independent hardwareplatforms, placing the high-reliability, real-time components onto acomputer running a real-time operating system, and the rear-seatinfotainment component on a separate computer. This solution is highlyunworkable, however, because of the need to tightly constraincomponent cost, size, power, and weight within the automobile.System virtualization provides an ideal alternative. Head-unitapplications run under the control of a typical infotainment operatingsystem such as GENIVI Linux or Android. In the back seat, eachpassenger has a private video monitor. One passenger could even rebootwithout affecting the second passengers Internet browsing session(e.g., two GENIVI operating systems). Meanwhile, critical services suchas rear-view camera, advanced driver assistance systems (ADAS), and3D clusters can be hosted on the same multicore processor, using aSystem Virtualization in Multicore Systems 265hypervisor that supports hard real-time, safety-critical applications, and avirtualized, multiplexed graphics processing unit (GPU), as shown inFigure 7.15.Medical imaging systemMedical scanners typically consist of three distinct subsystems: anoperator console, a real-time data-acquisition component, and a high-endgraphical display. Sometimes the system is implemented with threeseparate computers in order to avoid any resource conflicts andguarantee real-time performance. System virtualization, however, canconsolidate these systems onto a single multicore processor. Forexample, a desktop Linux OS can be used for the operator console whilespecialized real-time operating systems are used to handle the graphicsrendering and data acquisition. This architecture can provide the requiredisolation and performance guarantees while dramatically reducing size,weight, power, and cost of the system.ConclusionIncreases in software and system complexity and connectivity are drivingthe evolution in how embedded systems manage I/O and in thearchitecture of the operating systems and hypervisors that are responsiblefor ensuring their security. The combination of reduced-privilege,component-based designs as well as intelligent I/O virtualization toenable secure consolidation without sacrificing efficiency shall remain aFigure 7.15Automotive consolidation of center stack/head-unit and critical functions.266 Chapter 7focus of systems software suppliers in meeting the flexibility, scalability,and robustness demands of next-generation embedded systems.References[1] Whitaker, et al. Denali: Lightweight virtual machines for distributed andnetworked applications. USENIX Annual Technical Conference, (2002).[2] S. King, et al. SubVirt: Implementing malware with virtual machines. IEEESymposium on Security and Privacy, (2006).[3] T. Ormandy. An empirical study into the security exposure to hosts of hostilevirtualized environments. ,, 2006.[4] J. Rutkowska, A. Tereshkin, R. Wojtczuk. Detecting and preventing the Xenhypervisor subversions; Bluepilling the Xen hypervisor; Subverting the XenHypervisor. Black Hat USA, (2008).[5] CVE-2008-2100, National vulnerability database: ,[6] Available from:[7] Available from: 1&code5 P4080.[8] Available from: Virtualization in Multicore Systems 2677 System Virtualization in Multicore SystemsWhat is virtualization?A brief retrospectiveApplications of system virtualizationEnvironment sandboxingVirtual appliancesWorkload consolidationOperating system portabilityMixed-criticality systemsMaximizing multicore processing resourcesImproved user experienceHypervisor architecturesType 2Type 1ParavirtualizationMonolithic hypervisorConsole guest hypervisorMicrokernel-based hypervisorCore partitioning architecturesLeveraging hardware assists for virtualizationMode hierarchyIntel VTPower architecture ISA 2.06 embedded hypervisor extensionsARM TrustZoneARM Virtualization ExtensionsHypervisor robustnessSubVirtBlue pillOrmandyXen owning trilogyVMwares security certificationI/O VirtualizationPeripheral virtualization architecturesEmulationPass-throughMediated pass-throughPeripheral sharing architecturesHardware multiplexingHypervisor software multiplexingBridgingDom0/console guestCombinations of I/O virtualization approachesI/O virtualization within microkernelsCase study: power architecture virtualization and the freescale P4080Power architecture hardware hypervisor featuresPower architecture virtualized memory managementFreescale P4080 IOMMUHardware support for I/O sharing in the P4080Virtual machine configuration in power architectureExample use cases for system virtualizationTelecom blade consolidationElectronic flight bagIntelligent Munitions SystemAutomotive infotainmentMedical imaging systemConclusionReferences


View more >