es19 under the hood: inside the cloud computing hosting environmnent
TRANSCRIPT
Under the Hood: Inside The Cloud Computing Hosting Environment
Erick SmithDevelopment ManagerMicrosoft Corporation
Chuck LenzmeierArchitectMicrosoft Corporation
ES19
Introduce the fabric controller Introduce the service model Give some insight into how it all works Describe the workings at the data
center level Then zoom in to a single machine
Purpose Of This Talk/Agenda
Resource allocation Machines must be chosen to host roles of the service
Fault domains, update domains, resource utilization, hosting environment, etc. Procure additional hardware if necessary IP addresses must be acquired
Provisioning Machines must be setup Virtual machines created Applications configured DNS setup Load balancers must be programmed
Upgrades Locate appropriate machines Update the software/settings as necessary Only bring down a subset of the service at a time
Maintaining service health Software faults must be handled Hardware failures will occur Logging infrastructure is provided to diagnose issues
This is ongoing work…you’re never done
Deploying A Service Manually
Switches
Windows Azure Fabric Controller
Highly-availableFabric Controller
Out-of-band communication – hardware control In-band communication
– software control
WS08 Hypervisor
VMVM
VM
Control VM
Service RolesControl
Agent
WS08
Node can be a VM or a physical machine
Load-balancers
Fabric Controller (FC) Maps declarative service
specifications to available resources
Manages service life cycle starting from bare metal
Maintains system health and satisfies SLA
What’s special about it Model-driven
service management Enables utility-model
shared fabric Automates hardware
management
Windows Azure Automation
“What” is needed
Make it happen
Fabric
SwitchesLoad-balancers
Fabric Controller
Owns all the data center hardware Uses the inventory to host services
Similar to what a per machine operating system does with applications
The FC provisions the hardware as necessary
Maintains the health of the hardware Deploys applications to free resources Maintains the health of those applications
Fabric Controller
Load Balancer Channel
Endpoint
Interface
Directory Resource
Modeling Services
Fundamental Services
Load Balancer
Template automatically maps to service model
Public Internet
Background Process Role
Front-endWeb Role
The topology of your service The roles and how they are connected
Attributes of the various components Operating system features required Configuration settings Describe exposed interfaces
Required characteristics How many fault/update domains you need How many instances of each role
What You Describe In Your Service Model…
Allows you to specify what portion of your service can be offline at a time
Fault domains are based on the topology of the data center Switch failure Statistical in nature
Update domains are determined by what percentage of your service you will take out at a time for an upgrade
You may experience outages for both at the same time
System considers fault domains when allocating service roles Example: Don’t put all roles in same rack
System considers update domains when upgrading a service
Fault/Update Domains
Allocation is across fault domains
Fault domains
Purpose: Communicate settings to service roles There is no “registry” for services
Application configuration settings Declared by developer Set by deployer
System configuration settings Pre-declared, same kinds for all roles
Instance ID, fault domain ID, update domain ID Assigned by the system
In both cases, settings accessible at run time Via call-backs when values change
Dynamic Configuration Settings
Windows Azure Service LifecycleGoal is to automate life cycle as much as possible
Coding & Modeling
• New services and updates
Provisioning
• Desired configuration
Deployment
• Mapping and deploying to actual hardware
• Network configuration
AutomatedAutomatedDeveloper Developer/Deployer
Resource allocation Nodes are chosen based on constraints encoded in the
service model Fault domains, update domains, resource utilization, hosting
environment, etc. VIPs/LBs are reserved for each external interface described in
the model Provisioning
Allocated hardware is assigned a new goal state FC drives hardware into goal state
Upgrades FC can upgrade a running service
Maintaining service health Software faults must be handled Hardware failures will occur Logging infrastructure is provided to diagnose issues
Lifecycle Of A Windows Azure Service
Primary goal – find a home for all role instances Essentially a constraint satisfaction problem
Allocate instances across “fault domains” Example constraints include
Only roles from a single service can be assigned to a node Only a single instance of a role can be assigned to a node Node must contain a compatible hosting environment Node must have enough resources remaining
Service model allows for simple hints as to the resources the role will utilize
Node must be in the correct fault domain Nodes should only be considered if healthy
A machine can be sub-partitioned into VMs Performed as a transaction
Resources Come From Our Shared Pool
Key FC Data Structures
Logical Node
Logical Role
Instance
Logical Role
Logical Service
Physical Node
Role Instance
Description
Role Description
Service Description
Maintaining Node State
Logical Role InstanceGoal State
Current State
Logical Node
Physical Node
FC maintains a state machine for each node Various events cause node to move into a new state FC maintains a cache about the state it believes each node
to be in State reconciled with true node state via
communication with agent Goal state derived based on assigned role instances On a heartbeat event the FC tries to move the node closer
to its goal state (if it isn’t already there) FC tracks when goal state is reached
Certain events clear the “in goal state” flag
The FC Provisions Machines…
Virtual IPs (VIPs) are allocated from a pool Load balancer (LB) setup
VIPs and dedicated IP (DIP) pools are programmed automatically
Dips are marked in/out of service as the FCs belief about state of role instances change
LB probing is set up to communicate with agent on node which has real time info on health of role Traffic is only routed to roles ready to accept traffic
Routing information is sent to agent to configure routes based on network configuration
Redundant network gear is in place for high availability
…And Other Data Center Resources
Windows Azure FC monitors the health of roles FC detects if a role dies A role can indicate it is unhealthy Upon learning a role is unhealthy
Current state of the node is updated appropriately State machine kicks in again to drive us back into goals state
Windows Azure FC monitors the health of the host If the node goes offline, FC will try to recover it
If a failed node can’t be recovered, FC migrates role instances to a new node A suitable replacement location is found Existing role instances are notified of the
configuration change
The FC Keeps Your Service Running
FC can upgrade a running service Resources deployed to all nodes in parallel Done by updating one “update domain” at a time
Update domains are logical and don’t need to be tied to a fault domain
Goal state for a given node is updated when the appropriate update domain is reached
Two modes of operation Manual Automatic
Rollbacks are achieved with the same basic mechanism
How Upgrades Are Handled
Windows Azure provisions and monitors hardware elements Compute nodes, TOR/L2 switches, LBs, access routers,
and node OOB control elements Hardware life cycle management
Burn-in tests, diagnostics, and repair Failed hardware taken out of pool
Application of automatic diagnostics Physical replacement of failed hardware
Capacity planning On-going node and network utilization measurements Proven process for bringing new hardware
capacity online
Behind The Scenes Work
Your services are isolated from other services Can access resources
declared in model only Local node resources –
temp storage Network end-points
Isolation using multiple mechanisms
Automatic application of windows security patches Rolling operating system
image upgrades
Service Isolation And Security
Managed code
Restriction of privileges
Firewall
VirtualMachine
IP filtering
FC is a cluster of 5-7 replicas Replicated state with automatic failover New primary picks up seamlessly from failed replica Even if all FC replicas are down, services continue to function
Rolling upgrade support of FC itself FC cluster is modeled and controlled by a utility “root” FC
Windows Azure FC Is Highly Available
Client NodeFC Agent
FC Core
Object Model
Disk
Uncommitted Committed
FC Core
Object Model
Disk
Committed
FC Core
Object Model
Disk
Committed
Replication system
Primary FC Node Secondary FC Node Secondary FC Node
Network has redundancy built in Redundant switches, load balancers, and access routers
Services are deployed across fault domains Load balancers route traffic to active nodes only
Windows Azure FC state check-pointed periodically Can roll-back to previous checkpoints Guards against corrupted FC state, loss of all replicated
state, operator errors FC state is stored on multiple replicas across
fault domains
Windows Azure Fabric Is Highly Available
PDC release Automated service deployment Three service templates Support for changing number of running instances Simple service upgrades/downgrades Automated service failure discovery and recovery External VIP address/DNS name per service Service network isolation enforcement Automated hardware management
Include automated network load-balancer management For 2009
Ability to model more complex applications Richer service life-cycle management Richer network management
Service Life-cycle
Windows Azure automates most functions System takes care of running and keeping
services up Service owner in control
Self-management model through portal Secure and highly-available platform Built-in data center management
Capacity planning Hardware and network management
Summary
Virtualization And Deployment
Multi-tenancy with security and isolation Improved ‘performance/watt/$’ ratio Increased operations automation
Hypervisor-based virtualization Highly efficient and scalable Leverages hardware advances
Virtual Computing Environment
High-Level Architecture
Hypervisor
Guest PartitionHost Partition Guest Partition
Hardware
VirtualizationStack(VSP)
Drivers
Host OSServer Core
ApplicationsApplications
VirtualizationStack(VSC)
Guest OSServer Enterprise
VirtualizationStack(VSC)
Guest OSServer Enterprise
NIC Disk1
VMBUS VMBUS VMBUS
Disk2 CPU
Images are virtual hard disks (VHDs) Offline construction and servicing of images Separate operating
system and service images Same deployment model for root partition
Image-Based Deployment
HV-enabled Server Core base VHD
Image-Based Deployment
Host Partition
Host partition differencing
VHD
Guest Partition
Guest partition
differencing VHD
Guest partition
differencing VHD
Server Enterprise base VHD
Guest partition
differencing VHD
Application VHD
Application VHD
Application VHD
Server Core base VHD
Server Enterprise base VHD
Maintenance OS
App1 Package App3 Package App2 Package
Guest Partition Guest Partition
Deployment of images is just file copy No installation Background process Multicast
Image caching for quick update and rollback Servicing is an offline process Dynamic allocation based on
business needs Net: High availability at lower cost
Rapid And Reliable Provisioning
Tech Preview offers one virtual machine type Platform: 64-bit Windows Server 2008 CPU: 1.5-1.7 GHz x64 equivalent Memory: 1.7 GB Network: 100 Mbps Transient local storage: 250 GB Windows azure storage also available: 50 GB
Full service model supports more virtual machine types Expect to see more options post-PDC
Windows Azure Compute Instance
Hypervisor Efficient: Exploit latest processor virtualization
features (e.g., SLAT, large pages) Scalable: NUMA-aware for scalability Small: Take up little resources
Host/guest operating system Window Server 2008 compatible Optimized for virtualized environment I/O performance equally shared between
virtual machines
Windows Azure Virtualization
SLAT requires less hypervisor intervention associated with shadow page tables (SPT)
Allow more CPU cycles to be spent on real work Release memory allocated for SPT SLAT supports large page size (2MB and 1GB)
Second-Level Address Translation
Expe
nsiv
e
The system is divided into small groups of processors (NUMA nodes)
Each node has dedicated memory (local) Nodes can access memory residing in other
nodes (remote), but with extra latency
NUMA Support
NUMA Support
NUMA-aware for virtual machine scalability Hypervisor schedules resources to improve
performance characteristics Assign “near” memory to virtual machine Select “near” logical processor
for virtual processor
NUMA Scalability
NUMA-Aware Scheduler
Scheduler Tuned for datacenter workloads (ASP.NET, etc.) More predictability and fairness Tolerate heavy I/O loads
Intercept reduction Spin lock enlightenments Reduce TLB flushes VMBUS bandwidth improvement
More Hypervisor Optimizations
Automated, reliable deployment Streamlined and consistent Verifiable through offline provisioning
Efficient, scalable hypervisor Maximizing CPU cycles on
customer applications Optimized for datacenter workload
Reliable and secure virtualization Compute instances are isolated from
each other Predictable and consistent behavior
Summary
Related PDC sessions A Lap Around Cloud Services Architecting Services For The Cloud Cloud Computing: Programming In The Cloud
Related PDC labs Windows Azure Hands-on Labs Windows Azure Lounge
Web site http://www.azure.com/windows
Related Content
Evals & Recordings
Please fill
out your
evaluation for
this session at:
This session will be available as a recording at:
www.microsoftpdc.com
Please use the microphones provided
Q&A
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.