es19 under the hood: inside the cloud computing hosting environmnent

Under the Hood: Inside The Cloud Computing Hosting Environment

Erick SmithDevelopment ManagerMicrosoft Corporation

Chuck LenzmeierArchitectMicrosoft Corporation

ES19

Introduce the fabric controller Introduce the service model Give some insight into how it all works Describe the workings at the data

center level Then zoom in to a single machine

Purpose Of This Talk/Agenda

Resource allocation Machines must be chosen to host roles of the service

Fault domains, update domains, resource utilization, hosting environment, etc. Procure additional hardware if necessary IP addresses must be acquired

Provisioning Machines must be setup Virtual machines created Applications configured DNS setup Load balancers must be programmed

Upgrades Locate appropriate machines Update the software/settings as necessary Only bring down a subset of the service at a time

Maintaining service health Software faults must be handled Hardware failures will occur Logging infrastructure is provided to diagnose issues

This is ongoing work…you’re never done

Deploying A Service Manually

Switches

Windows Azure Fabric Controller

Highly-availableFabric Controller

Out-of-band communication – hardware control In-band communication

– software control

WS08 Hypervisor

VMVM

VM

Control VM

Service RolesControl

Agent

WS08

Node can be a VM or a physical machine

Load-balancers

Fabric Controller (FC) Maps declarative service

specifications to available resources

Manages service life cycle starting from bare metal

Maintains system health and satisfies SLA

What’s special about it Model-driven

service management Enables utility-model

shared fabric Automates hardware

management

Windows Azure Automation

“What” is needed

Make it happen

Fabric

SwitchesLoad-balancers

Fabric Controller

Owns all the data center hardware Uses the inventory to host services

Similar to what a per machine operating system does with applications

The FC provisions the hardware as necessary

Maintains the health of the hardware Deploys applications to free resources Maintains the health of those applications

Fabric Controller

Load Balancer Channel

Endpoint

Interface

Directory Resource

Modeling Services

Fundamental Services

Load Balancer

Template automatically maps to service model

Public Internet

Background Process Role

Front-endWeb Role

The topology of your service The roles and how they are connected

Attributes of the various components Operating system features required Configuration settings Describe exposed interfaces

Required characteristics How many fault/update domains you need How many instances of each role

What You Describe In Your Service Model…

Allows you to specify what portion of your service can be offline at a time

Fault domains are based on the topology of the data center Switch failure Statistical in nature

Update domains are determined by what percentage of your service you will take out at a time for an upgrade

You may experience outages for both at the same time

System considers fault domains when allocating service roles Example: Don’t put all roles in same rack

System considers update domains when upgrading a service

Fault/Update Domains

Allocation is across fault domains

Fault domains

Purpose: Communicate settings to service roles There is no “registry” for services

Application configuration settings Declared by developer Set by deployer

System configuration settings Pre-declared, same kinds for all roles

Instance ID, fault domain ID, update domain ID Assigned by the system

In both cases, settings accessible at run time Via call-backs when values change

Dynamic Configuration Settings

Windows Azure Service LifecycleGoal is to automate life cycle as much as possible

Coding & Modeling

• New services and updates

Provisioning

• Desired configuration

Deployment

• Mapping and deploying to actual hardware

• Network configuration

AutomatedAutomatedDeveloper Developer/Deployer

Resource allocation Nodes are chosen based on constraints encoded in the

service model Fault domains, update domains, resource utilization, hosting

environment, etc. VIPs/LBs are reserved for each external interface described in

the model Provisioning

Allocated hardware is assigned a new goal state FC drives hardware into goal state

Upgrades FC can upgrade a running service

Maintaining service health Software faults must be handled Hardware failures will occur Logging infrastructure is provided to diagnose issues

Lifecycle Of A Windows Azure Service

Primary goal – find a home for all role instances Essentially a constraint satisfaction problem

Allocate instances across “fault domains” Example constraints include

Only roles from a single service can be assigned to a node Only a single instance of a role can be assigned to a node Node must contain a compatible hosting environment Node must have enough resources remaining

Service model allows for simple hints as to the resources the role will utilize

Node must be in the correct fault domain Nodes should only be considered if healthy

A machine can be sub-partitioned into VMs Performed as a transaction

Resources Come From Our Shared Pool

Key FC Data Structures

Logical Node

Logical Role

Instance

Logical Role

Logical Service

Physical Node

Role Instance

Description

Role Description

Service Description

Maintaining Node State

Logical Role InstanceGoal State

Current State

Logical Node

Physical Node

FC maintains a state machine for each node Various events cause node to move into a new state FC maintains a cache about the state it believes each node

to be in State reconciled with true node state via

communication with agent Goal state derived based on assigned role instances On a heartbeat event the FC tries to move the node closer

to its goal state (if it isn’t already there) FC tracks when goal state is reached

Certain events clear the “in goal state” flag

The FC Provisions Machines…

Virtual IPs (VIPs) are allocated from a pool Load balancer (LB) setup

VIPs and dedicated IP (DIP) pools are programmed automatically

Dips are marked in/out of service as the FCs belief about state of role instances change

LB probing is set up to communicate with agent on node which has real time info on health of role Traffic is only routed to roles ready to accept traffic

Routing information is sent to agent to configure routes based on network configuration

Redundant network gear is in place for high availability

…And Other Data Center Resources

Windows Azure FC monitors the health of roles FC detects if a role dies A role can indicate it is unhealthy Upon learning a role is unhealthy

Current state of the node is updated appropriately State machine kicks in again to drive us back into goals state

Windows Azure FC monitors the health of the host If the node goes offline, FC will try to recover it

If a failed node can’t be recovered, FC migrates role instances to a new node A suitable replacement location is found Existing role instances are notified of the

configuration change

The FC Keeps Your Service Running

FC can upgrade a running service Resources deployed to all nodes in parallel Done by updating one “update domain” at a time

Update domains are logical and don’t need to be tied to a fault domain

Goal state for a given node is updated when the appropriate update domain is reached

Two modes of operation Manual Automatic

Rollbacks are achieved with the same basic mechanism

How Upgrades Are Handled

Windows Azure provisions and monitors hardware elements Compute nodes, TOR/L2 switches, LBs, access routers,

and node OOB control elements Hardware life cycle management

Burn-in tests, diagnostics, and repair Failed hardware taken out of pool

Application of automatic diagnostics Physical replacement of failed hardware

Capacity planning On-going node and network utilization measurements Proven process for bringing new hardware

capacity online

Behind The Scenes Work

Your services are isolated from other services Can access resources

declared in model only Local node resources –

temp storage Network end-points

Isolation using multiple mechanisms

Automatic application of windows security patches Rolling operating system

image upgrades

Service Isolation And Security

Managed code

Restriction of privileges

Firewall

VirtualMachine

IP filtering

FC is a cluster of 5-7 replicas Replicated state with automatic failover New primary picks up seamlessly from failed replica Even if all FC replicas are down, services continue to function

Rolling upgrade support of FC itself FC cluster is modeled and controlled by a utility “root” FC

Windows Azure FC Is Highly Available

Client NodeFC Agent

FC Core

Object Model

Disk

Uncommitted Committed

FC Core

Object Model

Disk

Committed

FC Core

Object Model

Disk

Committed

Replication system

Primary FC Node Secondary FC Node Secondary FC Node

Network has redundancy built in Redundant switches, load balancers, and access routers

Services are deployed across fault domains Load balancers route traffic to active nodes only

Windows Azure FC state check-pointed periodically Can roll-back to previous checkpoints Guards against corrupted FC state, loss of all replicated

state, operator errors FC state is stored on multiple replicas across

fault domains

Windows Azure Fabric Is Highly Available

PDC release Automated service deployment Three service templates Support for changing number of running instances Simple service upgrades/downgrades Automated service failure discovery and recovery External VIP address/DNS name per service Service network isolation enforcement Automated hardware management

Include automated network load-balancer management For 2009

Ability to model more complex applications Richer service life-cycle management Richer network management

Service Life-cycle

Windows Azure automates most functions System takes care of running and keeping

services up Service owner in control

Self-management model through portal Secure and highly-available platform Built-in data center management

Capacity planning Hardware and network management

Summary

Virtualization And Deployment

Multi-tenancy with security and isolation Improved ‘performance/watt/$’ ratio Increased operations automation

Hypervisor-based virtualization Highly efficient and scalable Leverages hardware advances

Virtual Computing Environment

High-Level Architecture

Hypervisor

Guest PartitionHost Partition Guest Partition

Hardware

VirtualizationStack(VSP)

Drivers

Host OSServer Core

ApplicationsApplications

VirtualizationStack(VSC)

Guest OSServer Enterprise

VirtualizationStack(VSC)

Guest OSServer Enterprise

NIC Disk1

VMBUS VMBUS VMBUS

Disk2 CPU

Images are virtual hard disks (VHDs) Offline construction and servicing of images Separate operating

system and service images Same deployment model for root partition

Image-Based Deployment

HV-enabled Server Core base VHD

Image-Based Deployment

Host Partition

Host partition differencing

VHD

Guest Partition

Guest partition

differencing VHD

Guest partition

differencing VHD

Server Enterprise base VHD

Guest partition

differencing VHD

Application VHD

Application VHD

Application VHD

Server Core base VHD

Server Enterprise base VHD

Maintenance OS

App1 Package App3 Package App2 Package

Guest Partition Guest Partition

Deployment of images is just file copy No installation Background process Multicast

Image caching for quick update and rollback Servicing is an offline process Dynamic allocation based on

business needs Net: High availability at lower cost

Rapid And Reliable Provisioning

Tech Preview offers one virtual machine type Platform: 64-bit Windows Server 2008 CPU: 1.5-1.7 GHz x64 equivalent Memory: 1.7 GB Network: 100 Mbps Transient local storage: 250 GB Windows azure storage also available: 50 GB

Full service model supports more virtual machine types Expect to see more options post-PDC

Windows Azure Compute Instance

Hypervisor Efficient: Exploit latest processor virtualization

features (e.g., SLAT, large pages) Scalable: NUMA-aware for scalability Small: Take up little resources

Host/guest operating system Window Server 2008 compatible Optimized for virtualized environment I/O performance equally shared between

virtual machines

Windows Azure Virtualization

SLAT requires less hypervisor intervention associated with shadow page tables (SPT)

Allow more CPU cycles to be spent on real work Release memory allocated for SPT SLAT supports large page size (2MB and 1GB)

Second-Level Address Translation

Expe

nsiv

e

The system is divided into small groups of processors (NUMA nodes)

Each node has dedicated memory (local) Nodes can access memory residing in other

nodes (remote), but with extra latency

NUMA Support

NUMA Support

NUMA-aware for virtual machine scalability Hypervisor schedules resources to improve

performance characteristics Assign “near” memory to virtual machine Select “near” logical processor

for virtual processor

NUMA Scalability

NUMA-Aware Scheduler

Scheduler Tuned for datacenter workloads (ASP.NET, etc.) More predictability and fairness Tolerate heavy I/O loads

Intercept reduction Spin lock enlightenments Reduce TLB flushes VMBUS bandwidth improvement

More Hypervisor Optimizations

Automated, reliable deployment Streamlined and consistent Verifiable through offline provisioning

Efficient, scalable hypervisor Maximizing CPU cycles on

customer applications Optimized for datacenter workload

Reliable and secure virtualization Compute instances are isolated from

each other Predictable and consistent behavior

Summary

Related PDC sessions A Lap Around Cloud Services Architecting Services For The Cloud Cloud Computing: Programming In The Cloud

Related PDC labs Windows Azure Hands-on Labs Windows Azure Lounge

Web site http://www.azure.com/windows

Related Content

http://www.azure.com/windows

http://www.azure.com/windows

Evals & Recordings

Please fill

out your

evaluation for

this session at:

This session will be available as a recording at:

www.microsoftpdc.com

Please use the microphones provided

Q&A

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market

conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

es19 under the hood: inside the cloud computing hosting environmnent

Documents

single service

service modelfault domains

service rolesthere

service rolesexample

service modelgive

natureupdate domains

servicefault domains

timefault domains