vmworld 2013: part 2: how to build a self-healing data center with vcenter orchestrator
DESCRIPTION
VMworld 2013 Nicholas Colyer, Catamaran RX Dan Mitchell, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshareTRANSCRIPT
Part 2: How to Build a Self-Healing Data Center with
vCenter Orchestrator
Nicholas Colyer, Catamaran RX
Dan Mitchell, VMware
VCM5695
#VCM5695
2 2
Session Agenda
vCenter Orchestrator Overview: A quick look at vCenter
Orchestrator platform
VMware Example: vCenter Operations Manager Remediation
package - using vCenter Orchestrator and vCenter Operations
Manager
Customer Example: Real-world use cases addressed by one
customer using vCenter Orchestrator
Partner Example: vCenter Orchestrator plugins by partners like
EMC address common use cases for remediation
3 3
Key Takeaways
Advice, considerations and implementation tips for real-world use cases
Understand the concept of the self-healing data center
and how vCenter Orchestrator supports it 1
2
3
Hear from a customer regarding their experiences today and how they
will continue to take advantage of vCO remediation capabilities
4 4
VMware vCenter Orchestrator Product Overview
5 5
vCenter Orchestrator Overview
Features
Drag-&-drop
design
• Create powerful workflows easily by drop-&-dragging pre-built actions
Cloud
scalability
• Execute hundreds to thousands of workflows in parallel to meet cloud scale
Flexible
triggers
• Launch workflows from the vSphere Web Client, vCloud Automation Center, browser, schedule, event, and API
Automate
VMware
• 100% coverage of vSphere and vCloud APIs
• Unmatched VMware content
Key Benefits
• Integrate VMware solutions into your IT environment and processes
• Reduce IT OpEx and total cost of ownership of VMware solutions
• Automate your cloud and accelerate transition to “IT as a Service” model
Platform
Plug-ins Ecosystem
vSphere
• Included with vSphere at no extra cost
• Installed with vCenter OOTB
Included with
vCenter Server
Fully Integrated
with vCAC
• Trigger vCO workflows from vCAC
• Use vCO to configure and extend vCAC
6 6
vCO Workflow Designer
• Drag and drop actions
• Conditional logic
• Pause, wait until, counters, etc.
• Exception handling
• Version control
• Role-based access control
• And more ...
~500 workflows and actions for vCenter Server
and vCloud Director
7 7
• Windows
• Mac & Linux
Designer
• SOAP
• REST
Web Services Operator
• vSphere Web Client
High-level vCO Product Architecture
• Oracle
• MS SQL Server
• PostgreSQL
Workflow Library
Webview Library
Workflow Engine
…
vCO Platform (Access points)
Management Systems
IT Infrastructure
vCO Platform (Engine, 64-bit)
vCO Plug-Ins
……
• vCloud Automation Center
• Service Catalogs
• AMQP
• SNMP
External
Notifications
8 8
• vCenter Server 4.0, 4.1, 5.0 & 5.1
• vCloud Director 1.0, 1.5 & 5.1
• vCloud Automation Center 5.1 & 5.2
• vCenter Update Manager 4.1, 5.0 & 5.1
• vCenter Chargeback 2.0
• vCenter Configuration Manager 5.5
• vCenter Orchestrator Multi-node 5.0 & 5.1
• vSphere Auto Deploy
• VMware Service Manager 9.1
• VMware Service Elasticity
• Microsoft AD & PowerShell
• AMQP /
RabbitMQ
• Email (POP3)
• Email (SMTP)
• HTTP-REST
• JDBC
• SOAP
• SNMP v1, v2c, v3
• SQL
• SSH
• Telnet
• XML
• BMC Atrium CMDB & Remedy – NEW
• EMC Unified Infrastructure Manager – NEW
• Infoblox NIOS – UPDATED
• Egenera PAN Manager - NEW
• Radware vDirect
• ServiceNow
• Up.time Software
Standard Protocols
Partner Applications • F5 Networks BigIP – NEW
• EMC ViPR – NEW
• Cisco UCS Manager 2.x – NEW
• NetApp storage
• Bluecat Networks
• VMware vCenter Network and Security
• VMware Site Recovery Manager
• HP ServiceManager
Upcoming releases
VMware Applications
Thousands of Out of the Box Workflows & Actions
9 9
• Improve scalability & availability
• Built-in HA & clustering
• Support external load balancers
• Extend the vCO REST API to:
• vCO server installation
• vCO server configuration
• Provide higher availability
• Scale orchestration capacity along with
the growth of your cloud
• Enable dynamic scale-up and scale-
down of orchestration capacity
Overview
Benefits
Optimized for Growing Clouds
Orchestration HA and
dynamic elasticity!
10 10
VMware Cloud Automation
vCloud Automation Center (IaaS, & DaaS Automation )
Infrastructure
Integration
• CMDB
• DNS
• IPAM
• Load
Balancers
• Service Desk
• Monitoring
Systems
• Databases
• Web Services
• Etc.
Fabric
Management
Automation
vC
en
ter
Orc
hes
trato
r IT
Pro
cess A
uto
matio
n
Some Use Cases:
o Automation of vSphere administrative tasks
o Remediation of infrastructure failures
o Automation of general IT admin tasks
Primary Role & Use Cases for vCenter Orchestrator
11 11
VMware Example – vCenter Operations
Manager with vCenter Orchestrator
Automated Remediation
12 12
vCenter Operations Remediation Workflow Package
What is its purpose?
• The purpose of the vCenter Operations Manager Remediation Workflow
Package is to be able to launch remediation workflows in vCenter
Orchestrator, as response to alerts received from vCenter Operations
Requirements on which the solution is based
• Create a solution for the problem - to be launching workflows, when vCenter
Operations alerts are received
• This solution should be simple and should not need any programming or
scripting from the user
• The user should be able to launch any workflow, from the library, or his/her
own creation, as a response to an alert
• It should be easily configurable
• The user should be able to filter the incoming events, based on different
alert properties
13 13
vCenter Operations Remediation Workflow Package
What do I need to use it?
• vCenter Orchestrator virtual appliance. (v5.1 or later)
• vCenter Orchestrator SNMP plugin
• vCenter Operations integration package
• vCenter Operations Manager
How does it work?
• vCenter Operations Manager sends SNMP traps to vCenter Orchestrator
• vCenter Orchestrator acts on the appropriate traps by executing workflows
14 14
vCenter Orchestrator + vCOps Remediation
1. vCenter health and operational
data is continually passed to
vCOps for analysis
2. When vCOps identifies an
operational issue, it throws an
SNMP trap to vCO, triggering a
vCO Policy to process the trap
3. vCO verifies the incoming trap is
mapped to an alert definition
4. vCO verifies there are filter
conditions defined for the trap
5. vCO launches the appropriate
remediation workflow
6. The vCO remediation workflow
corrects the operational issue
15 15
vCenter Orchestrator + vCOps Remediation
1. vCenter health and operational
data is continually passed to
vCOps for analysis
2. When vCOps identifies an
operational issue, it throws an
SNMP trap to vCO, triggering a
vCO Policy to process the trap
3. vCO verifies the incoming trap is
mapped to an alert definition
4. vCO verifies there are filter
conditions defined for the trap
5. vCO launches the appropriate
remediation workflow
6. The vCO remediation workflow
corrects the operational issue
16 16
Example Use Case: Identify a Datastore Capacity Issue
Datastore
running out of
capacity
17 17
Example Use Case: Identify Powered Off VMs
Powered off VMs
on the datastore
19 19
vCenter Operations Alerts Trigger Outbound Notification
Alerts trigger outbound
notification via Email and
SNMP traps
20 20
vCenter Orchestrator SNMP Trap Policy Workflow
SNMP Trap policy
workflow
Sample code that starts
remediation workflow if
capacity remaining alert is
received
21 21
Automate Remediation Using vCenter Orchestrator Workflows
Workflow to list powered off
VMs and VM snapshots to
resolve capacity issue
Prepare report and
send email notification
22 22
Email Notification from the Datastore Remediation Workflow
Email listing
powered off VMs and
associated snapshots
23 23
Customer Example – CatamaranRX Nick Colyer
Team Lead – Server Engineering
CatamaranRX
24 24
Customer Examples of Automation – Nick Colyer, CatamaranRX
Who is Nick Colyer?
• Brief History
• Blog: v-nick.com
• Twitter: @vNickC
How I got into automation
My Examples:
• Example #1
• Self Healing: Automating Configurations
• Example #2
• Self Healing: Automating Incident responses
25 25
Example 1:
Automating Configuration HA and DRS Settings
Start with a Goal in mind:
“I want to make sure that my ESXi
Clusters are checked every day to
ensure HA is on, DRS is fully
automated.”
Customer Example 1 – Automating Configuration for HA / DRS
26 26
Admission Control
settings 1
Enable Host
Monitoring 2
Break it down - HA Settings
Customer Example 1 – Automating Configuration for HA / DRS
27 27
Break it down - DRS Settings
DRS to Fully
Automated
3
4
Ensure other settings
remain!
Affinity Rules etc.
Customer Example 1 – Automating Configuration for HA / DRS
28 28
Customer Example 1 - Building the Workflow
Feed in clusters 1
Run corrective action 2
Repeat for every
cluster in your
environment
3
Schedule workflow to
run every night 4
29 29
Customer Example 1 - Create a Reusable Action Item
Create scriptable tasks workflow or an action.
30 30
Action Item: Enable HA/DRS Javascript
1. Calculate HA % based on number of hosts
//Get all the hosts in the cluster
var Hosts = System.getModule("com.vmware.library.vc.cluster").getAllHostSystemsOfCluster(cluster);
System.log("Number of Hosts in Cluster: " + Hosts.length);
//Calculate HA Percentage to tolerate 1 host worth of resources being offline
var HApercent = ((1/Hosts.length)*100);
HApercent = HApercent.toFixed(0);
//Log it
System.log("HA Percent which will be used for cluster is: " + Hapercent)
31 31
2. Turn on HA and DRS (partial code)
Action Item: Enable HA/DRS Javascript
var clusterConfigSpec = new VcClusterConfigSpecEx();
clusterConfigSpec.drsConfig = new VcClusterDrsConfigInfo();
clusterConfigSpec.dasConfig = new VcClusterDasConfigInfo();
//Enable DRS/HA
System.log("Setting HA and DRS to Enabled (even if they were already)");
clusterConfigSpec.dasConfig.enabled = true;
clusterConfigSpec.drsConfig.enabled = true;
//Reconfigure the cluster, by adding the True parameter this ensures any previous settings remain
System.log("Executing Cluster Reconfiguration for " + cluster.name);
task = cluster.reconfigureComputeResource_Task(clusterConfigSpec, true);
IMPORTANT!
If you don’t add the true
option, it will remove all your
other existing HA/DRS
settings. i.e. affinity rules
32 32
Customer Example 1: Workflow in action
Video Example:
http://bit.ly/vco-hadrs-automate
33 33
Example 2:
Automation in response to an event
Start with a Goal in mind:
“Enable repeatable scripted
actions to be initiated in response
to an SNMP Trap”
34 34
Customer Example 2: Breaking it Down…
vCenter critical alarm for a
datastore over 95% full 1
Send trap to vCO 2
Run Storage DRS on
storage pool 3
35 35
Customer Example 2: How Do We Achieve This…
1. Configure SNMP Trap receiver on vCenter Orchestrator
• http://blogs.vmware.com/orchestrator/2011/09/snmp-plug-in-integration-with-vcenter.html
• http://www.vcoportal.de/2012/05/integrate-vcops-and-vco/
2. Create Workflow which interprets traps
3. Create Workflows for repeatable automated corrective actions
a. Locates Datastore Cluster which Datastore is a member of
b. Executes SDRS
a. Expand on it further: Auto provision a LUN from the SAN
36 36
Customer Example 2: Master Workflow That Feeds Corrective Action Workflows
1
Scriptable task to interpret
trap data
Does the trap contain
something we know how to
handle?
2
3 Run corrective
action
37 37
Customer Example 2: Run SDRS Workflow in Detail
1. Search vCenter for a datastore with the same name as the one in the
trap.
2. Check SDRS Pods to see if they contain the datastore object
3. Refresh Storage recommendations
task = m.refreshStorageDrsRecommendation(podToRunSDRSOn)
Full script on my web site: v-nick.com
38 38
Customer Example 2: Workflow in Action
Video Example:
http://bit.ly/vco-sdrs-automate
39 39
Taking It to the Next Level…
1. Instead of just running SDRS, create a workflow to auto-provision
storage from the array when the space left in an SDRS pool gets
below a threshold
2. Have a workflow that automatically creates the change order, but
waits for someone to actually release the workflow
3. Corrective actions from other monitoring systems
• i.e. Solarwinds/SCOM when a Windows 2008 Server drive is below critical
amount.
• vCO can automatically expand the disk in vSphere, and then expand it inside
the OS.
40 40
Being Successful in a Corporate Environment
How do you start?
• Need upper leadership to be bought into the idea of automation.
• Standardize > Write Procedures > Automate
• Adopt an automate first approach
Develop a team that will become “Stewards” of vCO
• Empower others to automate
Keep it simple
• Re-use existing code
• Look at the built in workflows
Know what other tools in your environment can integrate with vCO
(e.g. ServiceNow)
41 41
Partner Example – EMC Unified
Infrastructure Manager
42 42
vCenter Orchestrator + EMC Unified Infrastructure Manager plug-in
Use Case 1: vSphere Cluster capacity at maximum, need to add host
• Virtual machines are running slow and you find out hosts are overloaded and
running low on CPU and memory
• VMware administrator can initiate adding a new server to the UIM VDI service,
making it available as a new host to vCenter, either through the vCO interface,
or through the vSphere web client
43 43
vCenter Orchestrator + EMC Unified Infrastructure Manager plug-in
Use Case 2: Low remaining capacity on Oracle database server
• An Oracle database is running out of storage, which could impact availability of
the production applications
• VMware administrator can initiate adding additional storage array LUNs to the
UIM Oracle service, making them available as datastores within the vCenter
cluster, either through the vCO interface, or through the vSphere web client
44 44
Advice, Considerations and Tips
Map out your process
• Before trying to automate anything, map out how YOU would fix the problem,
step by step
Factor in alert storms
• Design your workflows to be aware of its active instances to prevent overlap
Know when to give up
• Remediation workflows only know as much as you teach them. If fixing an
issue goes beyond the capabilities of your workflows, add notifications to let
you know when manual intervention is required
Establish credibility with the low-hanging fruit
• Start out by fixing the easy stuff – stray snapshots, remounting of data stores
Don’t reinvent the wheel!
• Leverage the established community of vCenter Orchestrator experts – many
have example workflows and packages to offer!
.
45 45
Questions?
46 46
VMworld on Social Media
@startswithv – Dan M
#CloudMgmt
#CloudAutomation
#VMworld
47 47
Summary: SDDC Delivers Transformational Benefits
* Claims being validated by the Taneja Group (final numbers expected August, 2013).
Support for over 500
ISV solutions and 80
operating systems
Choice
Any App Anywhere
Reduce IT capex by
75% and opex by 56%*
Cloud Service Provider
Economics
Control
Reduce downtime
for tier 1 applications
by 36%*
Cloud on Your Terms
Agility
Increase IT
productivity by 67%*
Apps at Business Speed
Start Your Journey with the VMware SDDC Today
48 48
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1307
vCloud Automation Solutions
VCM5695
THANK YOU
Part 2: How to Build a Self-Healing Data Center with
vCenter Orchestrator
Nicholas Colyer, Catamaran RX
Dan Mitchell, VMware
VCM5695
#VCM5695