xen summit 2010 extending xen into embedded
TRANSCRIPT
Xen Summit 2010Extending Xen into Embedded
and Communications Workloads
• Embedded Usage Models
• Virtual Machine Monitor Requirements
• Benchmarking
• Cisco Product Range
• Embedded Development Requirements
Agenda
• High Availability
09.14.052
Embedded Usage Models
IP Media PhonesAtom based platforms delivering Internet connectivity and media content to continuous connected devices.
RoboticsUsing Core Micro Architecture for GUI interface with real time industrial control.
Routing
3
Unique VMM requirements across all segments
Routing Xeon Micro Architecture based platforms implement control and data-plane services on high end routers.
Virtual Machine Monitor Implementation
RTOS
Industrial Control requires determinism. Performance is measured in interrupt latency (10 usec or lower)
(Service)Linux
Linux
RTOS
Scalability, Flexibility, RAS and Fail Over are a few of the vmm requirements in Comm’s appliance environment
CriticalPartition
Apppartition
Critical partition required to host Cell phone application, hypervisor requires Quality of Service
MicrosoftGUI
Shared
4
vmm
Industrial
vmm
Comm’s Appliance
Thin vmm
Media Phone
SharedMemory
Embedded Virtualization - Advantages
Consolidation and Preservation Legacy - Proprietary Single Threaded Operating Systems
Rapid Deployment of new services
vmm
LegacyRTOS
LegacyRTOS
Linux
VT-d / SRIOV
Dataplane Control
5
services
Integrate Development Environment separate from Critical Services
Multi-CoreArchitecture
VT-d / SRIOV
Core 0 Core 1
rx
tx
rx
tx
10 Gb/sPF
Embedded Deployment Requirements
Scheduling control for Guest Quality of Service
Traffic prioritization to avoid packet loss requires (soft) Real Time scheduling
Credit based scheduler research in progress
Xen
AppDevelopment
PhoneApplicationDom0
Atom I/O I/O
Grant TablesConsolidate Fast Path with Security Consolidated
Single Core scheduling
6
Xen
Fast Path
Xeon I/O
Dom0Forwarding
Linux
ippacket
IntrusionDetection
Grant TablesConsolidate Fast Path with Security Intrusion Detection application
Requires efficient mechanism to share packet data with Linux application
Grant tables (io rings) maybe an efficient mechanism to meet performance requirements (needs to be Lock Free)
Consolidated fast path
io rings
Embedded Xen Deployment
Power Profile of some edge based appliances is cyclical, potential power savings can be substantial (Example Base Station Controller)
ACPI support generally not supported in Real Time / Proprietary Operating Systems
Hypervisor Power Management could be very useful to control overall power budget
“Shelf Manager” Power management research in
0
20
40
60
80
100
120
Data
60
80
100
120
Voice
6am 6pm
Fast Path
Fast Path
Xen
Multi Core
Dom0
Shf mgr
Fast Path
Fast Path
Fast Path
Xen
Multi Core
Dom0
Shf mgr
Fast Path
7
“Shelf Manager” Power management research in progress
0
20
40
Fast Path
Fast Path
Xen
Multi Core
Dom0
Shf mgr
Fast Path
Intelligent Power Management, balances I/O latency & throughout
6am 6pm
Embedded Xen – Direct Cache Access
I/O
memory
IOHDCA
Cache
CPU ctrl
DCA - Direct Cache Access delivers data in cache to reduce average memory latency and attempts to reduce memory bandwidth
DCA Driver uses get_cpu() to gather APIC_ID, uses this to configure the DCA enabled NIC device
static void igb_update_dca(struct igb_q_vector *q_vector)
{
struct igb_adapter *adapter = q_vector->adapter;
struct e1000_hw *hw = &adapter->hw;
8
get_cpu() requires to return the valid APIC ID of the core where the guest is executing.
Xen
Dom0 Guest
Cache
CPU
Cache
CPU
Guest
struct e1000_hw *hw = &adapter->hw;
int cpu = get_cpu(); /* Get the current CPU Id*/
if (q_vector->cpu == cpu)
goto out_no_update;
Benchmarking, 10 GbE perspective
14,000,000
16,000,000
A 64B packet can arrive every 67.2ns
In terms of processor cycles : @ 2.53 GHz, a 64B packet arrives every ~201 cycles
Can generate up to 14.88 million Rx and 14.88 million Tx transactions every second (packets)
Each packet has a 16B descriptor associated with it, that must be written for every packet that needs to be processed
Mpp/s
The Linux forwarding code takes ~3000 cycles to process a packet.
9
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
64
118
172
226
280
334
388
442
496
550
604
658
712
766
820
874
928
982
1036
1090
1144
1198
1252
1306
1360
1414
1468
Packet Size
a packet.
With enhancement we can reduce the number of cycles per (64 Byte) packet to ~1350 cycles.
Guest Forwarding Performance Packets per Second (PPS)
Layer 3 Forwarding2-Port (1 Core, 1 Thread)
Native
Virtualized
Multi-CoreArchitecture
vmm
Linux
VT-d
Core 0 Core 1
I/O I/O
Linux
forwarding forwarding
10
64
128
256
512
768
1024
1280
1518
Packets per Second (PPS)
Packet Size (bytes)
Single threaded virtualized environments show promising performance:
- Near native performance for small packet sizes
- Native performance for large packet sizes ( >256B ).
Limited performance penalty for consolidation, additional scaling tests in progress
Wide range of products in a number of market segments:
Cisco Embedded Product Space
Data Center
UCS Nexus 7000
Service Provider
CRSASR 9000
Enterprise
Voice & Video
UnifiedCommunications
TelePresence
11
Security
ASA 5500Ironport
ASR 1000MDS 9222i (SAN)
Communications
Branch
3900 ISR 2800 ISR
Home
ValetFlip Video
Hardware EnvironmentGeneral Purpose CPUs, SoCs, ASICs, FPGAs, custom processors, ixp, DSPs, …
From large multi-core, multi-blade, multi-chassis systems to small single/dual core devices
Terabit to Gigabit I/O
Software EnvironmentMulti-OS: IOS, IOS-XE, IOS-XR, NX-OS
Proprietary (legacy), Linux, other …
Single threaded, multi-threaded, pipelined, flow-based, …
Embedded Product Environment
Single threaded, multi-threaded, pipelined, flow-based, …
Multiple vm models
integrated services platform, distributed/load balancing, HA, control & data separation, …
Control plane, data plane, management plane, appliance and service engines, …
e.g., routing, data, voice, video, deep packet inspection, firewall, security, etc.
Memory, processor, and I/O bandwidth requirements vary by application and network device location
12
We believe that xen is the right choice for an embedded hypervisorEarly support for prototype hardware required: In hypervisor and dom0
Open source xen and linux critical to this effort
It’s the right architecture and feature set for embedded development
RASHigh Availability (HA) for guests
non-disruptive stateful failover, non-disruptive in service software upgrade (ISSU)
Devices
hot pluggable/removable (non-disruptive): shared & dedicated (including sr-iov)
Embedded Development Requirements
hot pluggable/removable (non-disruptive): shared & dedicated (including sr-iov)
dom0
Separate device driver domains good, but not enough
All domains need to be restartable
Deterministic PerformanceQoS control through configuration and scheduling
I/O linearly scalable across cores and vms
Low latency interrupts
13
Core allocation/Scheduling: vcpu � pcpu mapping(pinned, non-shared): deterministic performance
(pinned, shared), (non-pinned, shared): scheduled
For pv IOS, I/O workload, 64-byte packets, 2 ports, bidirectional, 64-bit xen, NUMA on
Embedded Development Requirements
(pinned, non-shared), HT off 100%line rate (1Gb) per core<0.1% time spent in hypervisor
(non-pinned, shared), HT off ~10% decreased throughput
(pinned, non-shared), NUMA- remote, HT off ~8% decreased throughput
(pinned, non-shared), HT on, one on each 1.5x/1.7x (I/O/cpu) increase in
14
(pinned, non-shared), HT on, one on each thread on the core
1.5x/1.7x (I/O/cpu) increase in throughput (aggregate).75x/.85x (I/O/cpu) throughput per transaction single thread
(pinned, non-shared), HT on, only one thread on the core in use
Same as (pinned, non-shared), HT off
Guest SupportBoth pv and hvm (hybrid!)
32-bit & 64-bit
Virtual memory paged and non-paged (single, flat address space)
Debug and Performance Monitoringmulti-guest, simultaneous
32-bit & 64-bit guests (minimum is gdbsx for both pv & hvm)
Performance monitoring tools (access to PMU data - xenoprofile & others)
Required in the field as well as during development
Trusted Systems: Secure ProductsTrusted boot, TPM, Intel TXT/AMD-V
Trusted guests, sandboxed 3rd party guests, anti-counterfeiting, …
Manageable
Embedded Development Requirements
Power ManagementEspecially at the edge, branch, and consumer devices
Policy based, managed by hypervisor
Cases where guest should not be automatically power managed
“carrier class” xen Development EnvironmentSupport for rapid prototyping
Support for production product environment
15
RationaleHA & ISSU features available on many platforms across our product space today
Cannot go to market without support in certain product spaces
Software fails much more often than hardware
Software-only HA/ISSU at much lower cost very attractive
Natural fit on multi-core devices
High Availability (HA)Active-Standby: stateful, “hot” Standby
Failure of Active causes non-disruptive failover to Standby
Reconciliation required on switchover
HA Requirements
Reconciliation required on switchover
Standby progresses through state machine to Active state
I/O devices always belong to Active and switch to [new] Active without loss of state
Packet loss ok on switchover – higher level protocols recover
Downstream end of device connection must not see a “failure”
Switchover must take place in < 1 sec.
In Service Software Upgrade (ISSU)Built on HA infrastructure
Automated software upgrade (or downgrade)
Non disruptive: Fallback if required or requested
16
What is needed:
Reliable fast failure detection mechanism
Current: hardware uses interrupt pin; backup is heart-beat mechanism (slow)
Need to emulate/implement fast, reliable failure detection mechanism in xen
Failover device transparently from Active to Standby
no loss of [device] state
Packet traffic dropped until Standby transitions to Active
Interrupts
redirected to new Active (old Standby) on failover
interrupts dropped until Standby transitions to Active
HA Requirements
interrupts dropped until Standby transitions to Active
[new] Active must be able to address outstanding interrupts without complete reset
Need to be able to run in redundant hardware configuration or on multi-core device
drivers responsible for appropriate reconciliation protocols
Minimize the changes to xen kernel and dom0 code
recovery decisions need to be in the domain of the guest driver
Support for direct assign devices (including sr-iov) and shared devices
Non shared memory solution for DMA target memory preferred
requires ability to either pre-program and switch or reprogram and switch on failover
17
Needs to support 2 different Environments:
Rapid prototyping and development of new servicesWork often requires unstable branch, pre-release/prototype hardware
Straight forward, and accessible to the non xen expert
Interest is in getting the prototype/product up and running quickly rather than
xen infrastructure
Developer threads, blogs, etc. not a substitute for up-to-date documentation
Product decisions (go/no go) based on prototype results
“carrier class” xen Development Environment
Product decisions (go/no go) based on prototype results
Failure/missed deadlines will eliminate a prototype as a possible solution
Corporate networks/labs behind firewalls, use proxies
Doesn’t work well with current git-based source control
Requires exceptions to corporate IT policy
Production product
Uses stable release
Controlled access to performance & debug tools in customer environment
Documentation required in field as well
Auditing requires ability to reproduce image bit-for-bit from local build
18
• Embedded market provides for a great growth opportunity
• Deployment requires some unique features
• Xen is well positioned but requires support for RAS features, debug and “Carrier Class” Release
Summary
19