ok labs - virtualization as the nexus of multicore power management

Post on 14-Dec-2014

833 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

ARM TechCon Session "Virtualization as the Nexus of Multicore Power Management" Thursday, November 11, 2010 Adoption of multicore technology for the desktop,data center and embedded designs responds to comparable needs – to scale compute capacity without stepping up system clocks and to attain more MIPS-per-watt for devices and applications. Multicore for the desktop and data center enjoys mature support from deployed OSes. Even as embedded OSes become more adept at running on multicore CPUs, applications and middleware still face challenges of thread-safety, concurrency and load balancing. Mobile virtualization is a means to get maximum value from multicore ARM designs, at both architectural and app levels. It examines multicore use cases for virtualization, and how it brings superior CPU utilization,greater security, smoother legacy migration,& smarter energy management to multicore designs.

TRANSCRIPT

November 9-11, 2010The Santa Clara Convention Center

www.armtechcon.com

Energy Management for Mobile DevicesPower to the Microvisor!

Energy-management Virtualization basics Enter multicore Summary

Overview

Device uses energy• Drains battery

Goal of energy management:• Maximize battery life

Energy in Mobile Devices

Dynamic voltage and frequency scaling

CMOS power consumption:• P = Pdyn + Pstat

• Pdyn ∝ f V2

• Vmin ∝ f (very approximately)

Assuming execution time T 1 / ∝ f• Edyn = Pdyn T ∝ f V2 / f = V2 = f2

• lower frequency lower dynamic energy⇒

Energy-Management Mechanisms: DVFS

When CPU is idle, turn clock off• Pdyn = 0 ⇒ P = Pstat

Sleep states reduce power further:• Psleep < Pstat

Typically have multiple sleep states• shallow sleep states save some energy

but fast to enter/exit

• deep sleep states save more energy but lose state and are expensive to enter/exit

Complex tradeoff

Mechanisms: Sleep States

Edyn ∝ f 2 lowest frequency is best⇒ Ignores static energy!

• E = Edyn + Estat

• Edyn ∝ f 2

• Estat = Pstat T ∝ 1/f

Low f increases execution time ⇒ Estat increases at low f !

Popular Approach: Lowest Frequency

Run at maximum f, then go to sleep• Tries to minimize static power — but:

• dynamic power isn’t irrelevant (yet)– T 1/∝ f isn’t correct either — ignores memory!

• Effect of memory stalls• T = TCPU + Tmem

• TCPU ∝ 1/f • Tmem = const• Estat ∝ T = 1/f + const

Ignores sleep energy!

Other Approach: “Race to Halt”

Run at maximum f, then go to sleep Earlier completion longer sleep⇒

• E = Edyn + Estat + Esleep

• Esleep = Psleep Tsleep

• Tsleep = T0 – T

• Esleep = Psleep (T0 - T)

Still ignores dynamic energy!

Other Approach: “Race to Halt” (2)

Real Data: Execution Time

Memory-bound

Memory-bound

CPU-boundCPU-bound

Real Data: Total Energy (Measured)

CPU-boundCPU-bound

Memory-bound

Memory-bound Naïve

modelNaïvemodel

Real Data: Including Sleep Energy

High-powersleep stateHigh-powersleep state

Low-powersleep stateLow-powersleep state

Energy management is complex! Optimal setting depends on:

• Workload memory-bound vs CPU-bound vs in-between

• Hardware platform static vs dynamic energy CPU vs memory power depth of sleep states and cost of entering

Simple models don’t work!

Summary: Energy-Management Basics

How to establish memory-boundedness? Easy way out: pre-characterization

• measure behavior off-line

• determine optimal power setting by model or trial-and-error

Ok-ish for pre-defined workloads Unsuitable for open systems

• ... such as phones

Tricky with apps which change behavior

Characterizing Workloads

Need to observe app and adjust setting• works for any app

• adjusts to changing behavior

Solution by [Snowdon et al., EuroSys’09] Performance counters are your friends!

• e.g. cache misses indicate memory access

Can systematically select best counters• build model of platform

• Linear combination of performance-counter readings

• pre-characterize hardware

• pick counters which provide most accurate model

• using sound statistical methods

Better Way: On-Line Characterization

Model predicts energy consumption and relative execution speed• at present setpoint

• at different setpoins

Accurately predicts energy- and performance response to DVFS• within a few %

Can use this for informed energy-management decisions

On-Line Characterization & Modeling

Accuracy of Approach

Memory-bound

Memory-bound

CPU-boundCPU-bound

Effect on Energy

CPU-boundCPU-bound

Memory-bound

Memory-bound

What is “best”?• Maximal Performance?

• Minimal Energy?

• Minimal Power?

Depends... May change

• battery depletes

Need flexible policies

Energy Management Policies

Workload PredictionWorkload Prediction

CandidateSetpoints

QoS Info

Setting

Energy/Performance Energy/Performance ModelsModels

Selection PolicySelection Policy

Workload Statistics

Generalized Energy-Delay Policy

Generalized Energy-Delay Policy

PerformancePerformance

CPU-boundCPU-bound

Memory-bound

Memory-bound

EnergyEnergy

Multi-Tasking Workload

CPU-boundCPU-bound

Memory-bound

Memory-bound

Implementation of power model and policies• once for platform vs once for each guest

• no guest has global view, hypervisor does

• integration with other cores DSPs, baseband processor

• policy-mechanism separation

Why do it outside the OS?

Controls all resources• CPU, memory, devices

De-privileged guest OSes• execute in user mode

• prevents interference with hypervisor with other guests

• ensures hypervisor retains control over resources

The Hypervisor

Subsystems compete for it Cannot let subsystems manage it

• just as with memory, CPU

Needs trusted, central authority Needs to be done in virtualization layer

Energy is a Global Resource

Mechanisms in hypervisor Policies in isolated management module Keep hypervisor policy-free

• HW-like

Policy-Mechanism Separation

Additional degree of freedom• DVFS + sleep states + core shutdown

• Hypervisor supports transparent, temporaryconsolidation of cores

• Unneeded cores turned off to reduce power

Different tradeoffs• Performance vs power close to linear

Important to manage cores globally• In average more cores off than with

per-guest management• Can use deeper sleep state

• Less overall energy use

Enter Multicore

OKL4 Microvisor

Subsystem #1

CPU

VCPU VCPU VCPUVCPU

Subsystem #2

CPU CPUCPU

OKL4 Microvisor

Subsystem #1

CPU

VCPU VCPU VCPUVCPU

Subsystem #2

CPU CPUCPU

Cache coherency couples clock frequencies of multiple cores

OSes running on different cores cannot adjust clock independently

Requires entity with global view

Enter Multicore: Architectural Constraints

Cores have same ISA but different clock rates Hypervisor can determine optimal mapping of subsystems to cores

• Using same infrastructure as for DVFS

• Integrate with temporary core consolidation

Asymmetric Multicore

FastCPU

SlowCPU

OKL4 Microvisor

CPU-boundSubsystem

FastCPU

VCPU VCPU VCPUVCPU

Memory-boundSubsystem

SlowCPU

Move subsystems between cores• including temporary consolidation

of different subsystems on common core

Architectural inter-core dependencies• cannot manage core clocks independently

Requires global control• ... outside individual OSes

• indirection layer between OS and hardware

No practical alternative to virtualization!

The Future is Multicore

OKL4 Microvisor

Subsystem #1

CPU

VCPU VCPU VCPUVCPU

Subsystem #2

CPU CPUCPU

Virtualization is unavoidable long-term ... but provides other benefits short-term Early uptake maximises benefits Future-proof your designs!

Summary

Thank You!

top related