system automation, integration and recovery

52
1 ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice WebMD Health Corp.: agile system automation, integration and recovery using HP Server Automation and HP Operations Orchestration Derek Chang Manager, WebMD Roger Hsu Manager, WebMD

Upload: derek-chang

Post on 13-Dec-2014

246 views

Category:

Technology


3 download

DESCRIPTION

WebMD Health Corp.: agile system automation, integration and recovery using HP Server Automation and HP Operations Orchestration

TRANSCRIPT

Page 1: system automation, integration and recovery

1 ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

WebMD Health Corp.: agile system automation, integration and recovery using HP Server Automation and HP Operations Orchestration

– Derek Chang Manager, WebMD

– Roger HsuManager, WebMD

Page 2: system automation, integration and recovery

2Introduction & Agenda

TopicsTopics

Who are we and where we standInfrastructure LayoutMiddleware IntegrationHP OO preparationApplication Administration AutomationBuild Deployment AutomationUnattended WebMD Content BackupMaintenance Free SystemResults from HP SA/OO Implementation Q/A

Page 3: system automation, integration and recovery

3

cmsopscmsopsResponsibility– Provide Maintenance and 24x7 support of CMS applications and their

subsystems in production environment– Perform production system patches, bug fixes or software releases and other

build deployments.– Support ongoing releases and developments in non-production environments– Define/document production support requirements, escalation procedures,

issue tracking and guidelines for troubleshooting and build deployments.Resource: 4.5 headcounts*Universe– 300+ internal users– SDLC environments: dev/devint/qa00/qa01/qa02/perf/production– 130 servers– 4.4 TB of NAS storage for raw contents and site contents– Infrastructure: Zenoss, HPSA, HPOO, Serena teamtrack, MOSS, MSSQL/Oracle

Core technology– EMC Documentum– Proprietary applications

Page 4: system automation, integration and recovery

4

DocumentumDocumentum

An enterprise content management platform, now delivered by EMC Corporation, as well as the name of the software company that originally developed the technology.Flexible, versatile, powerful yet complex platformImplementation in WebMD– 2 major portal sites– 6 Documentum products– Proprietary content editor for advanced features– Proprietary page transformer– Proprietary utilities: 15 applications

Page 5: system automation, integration and recovery

5

ChallengesChallenges

Documentum is a new technologyDocumentum is a rare expertiseComplexity of the CMSCmsops support users within the companyWebMD is a fast growing company

Page 6: system automation, integration and recovery

6

Life in cmsopsLife in cmsops

Sampling duration: Oct 11,2007 – Jul 24, 2009653 days/426 working daysSource: customized teamtrack reports and emailsSummary1772 teamtrack requests479 email requests*5.3 tickets/working day

Page 7: system automation, integration and recovery

7

Our ApproachOur ApproachDevelop and utilize process templatesStandardize and adopt the development modelIdentify what processes to be automated– Routine/mundane activities– Human interactions cause

error/failure– Much longer

Lifecycle/Service time than development time

Page 8: system automation, integration and recovery

8

Infrastructure Layout

Page 9: system automation, integration and recovery

9

Infrastructure LayoutInfrastructure Layout

Opsware OO

Central and RAS

SAS OCLI client

Scheduler engine

NRAS

Web interface

Workflow engine

Repository

JRAS

SAS web services client

Build server

RHEL4u4_32BIT VM

Opsware agent

NAS/Build repository

OCLI 1.0

Web interface

Rpm/msi package tools

Opsware SAS

Central and RAS

Twister

OCLI engine

Web services engine

Web interface

Software repository

Server repository

JAVA API

Opsware agent engine

Middleware integration

Jboss 5.0

XML module

Email adapter

Email sender

Web interface

OO client

Teamtrack client

LDAP module

Data modeling

Corporate infrastructure

Active Directory

Teamtrack

Win2K3

Web interface

Business mashup engine

Web services

PAS LAB

RHEL4u6_64BIT VM

Opsware agent

App server(s)

Code base

QA/DEV Clients

Exchange server

Page 10: system automation, integration and recovery

10

Middleware Integration

Page 11: system automation, integration and recovery

11

Middleware IntegrationMiddleware Integration

Description– The core of the automation system– Connections among ticketing, monitoring, and system

administration tools within WebMD operations.– Providing operation tools without users accessing

underlying systems/tools

Page 12: system automation, integration and recovery

12

Middleware IntegrationMiddleware Integration

Ticketing system integration– Use web services to connect Serena Business Mashup

(TeamTrack)– Pull information from tickets and pass data to other

systems such as HP OO– Update tickets after automation operation

Page 13: system automation, integration and recovery

13

Middleware IntegrationMiddleware Integration

System Administration (HP SA/OO) integration– Java bean uses OO library to trigger OO workflow

– Parse the workflow result (XML format) to get:• OO flow id and report URL• Start time and end time• OO flow response and result

RSFlowInvoke rsf = new RSFlowInvoke();rsf.setUrl(url+flowName+paraString);rsf.setUsername(user);rsf.setPassword(pw);result = rsf.invoke();

Page 14: system automation, integration and recovery

14

Middleware IntegrationMiddleware Integration

Web Application– Allows users to use the automation tools via a web

browser over network to prevent access to underlying systems/tools such as HP OO directly

– Uses Ajax and Richfaces technologies to provide dynamic and intuitive user experiences

– Developed under JBoss Seam framework– Adopts Hibernate as Database layer framework

Page 15: system automation, integration and recovery

15

Middleware IntegrationMiddleware Integration

Security and User Authorization– Integrates with WebMD LDAP servers that allows users to

access the system with their WebMD id/password– JBoss Rules engine provides access control based on

WebMD LDAP groups of each user

Page 16: system automation, integration and recovery

16

HP OO Preparation

Page 17: system automation, integration and recovery

17

HP OO PreparationHP OO Preparation

Identify basic/out of the box OO operations– SSH– Windows Remote Command Execution– Change IIS status– Change Windows service status– OCLI to access HP SA– Iterator, Email CDO, …etc– Database operations (oracle/mssql)

Modulization and utility workflows– Use OO operations to build up utility workflows that will

be re-used frequently

Page 18: system automation, integration and recovery

18

HP OO PreparationHP OO Preparation

HostsSSH: run Linux commands in a list of hostsGiven a list of hosts to Iterator (PAS out-of-box operation)

SSH Command (PAS out-of-box operation)

Call Error Notice flow

Page 19: system automation, integration and recovery

19

HP OO PreparationHP OO Preparation

HostsWinCommand: run Windows commands in a list of hosts

Given a list of hosts to Iterator (PAS out-of-box operation)

SSH Command (PAS out-of-box operation)

Call Error Notice flow

Page 20: system automation, integration and recovery

20

HP OO PreparationHP OO Preparation

IIS Flows:– HostIISSites: control multiple IIS Sites on single host– HostsIISSites: control multiple IIS Sites on multiple hosts

Multiple hosts, multiple sites Single host, multiple sites

Given a list of hosts Given a list of sites

Page 21: system automation, integration and recovery

21

HP OO PreparationHP OO Preparation

Window Services flows:– HostWinSvcsCtrl: control multiple services on single host– HostsWinSvcsCtrl: control multiple services on multiple hosts

Multiple hosts, multiple services Single host, multiple services

Page 22: system automation, integration and recovery

22

Application Administration Automation

Page 23: system automation, integration and recovery

23

Application Administration AutomationApplication Administration Automation

Goal: Develop OO workflows to stop/start WebMD applications and sites

Workflow key features– Identify target servers– Windows: stop/start windows svc and IIS sites– Linux: stop/start applications and run any script if needed– Send error/success email notices

Page 24: system automation, integration and recovery

24

Application Administration AutomationApplication Administration Automation

Users pick available host type and environment based on the permission given to their LDAP groups

Login as consumer QA user

Consumer users are NOT allowed to pick professional hosts

QA users controls QA environments only

Page 25: system automation, integration and recovery

25

Application Administration AutomationApplication Administration Automation

Users hit one of the action buttons

User hits “Query Servers”

Page 26: system automation, integration and recovery

26

Application Administration AutomationApplication Administration Automation

Web application then triggers corresponding HP OO workflowOO workflows connect HP SA with OCLIHP SA takes actions on target hosts

Page 27: system automation, integration and recovery

27

Application Administration AutomationApplication Administration Automation

The OO workflows sends the result back to middleware in XML formatMiddleware parses the XML and display the result in GUI

dmas qa00 server

Page 28: system automation, integration and recovery

28

Application Administration AutomationApplication Administration Automation

Users receive email notices

Page 29: system automation, integration and recovery

29

Application Administration AutomationApplication Administration Automation

Application Administration workflows:– Documentum Content Servers– Documentum Application Servers– ATS: WebMD proprietary content transformer– PATS: WebMD proprietary content transformer– Page Builder: WebMD proprietary content editor

Page 30: system automation, integration and recovery

30

Application Administration AutomationApplication Administration Automation

WebMD Content Servers

Decision: start or shutdown

Stop when query servers only

OCLI Query Servers based on portal, product, host type, and environment

Initiate variables based on portal

Start/stop SCS (HostsSSH)

Start/stop JMS (HostsSSH)

Start/stop doc base (HostsSSH)

Clean up doc base (HostsSSH)

Send email notice when finishes

Page 31: system automation, integration and recovery

31

Application Administration AutomationApplication Administration Automation

WebMD Application ServersOCLI: query server list against SAS

HostsSSH: run commands in each host in the list

for i in `/opsw/api/com/opsware/server/ServerService/method/.findServerRefs:i filter='${filterString}'`;

do /opsw/api/com/opsware/server/ServerService/method/getServerVO self:i="$i";

done

{device_servergroup_name equal_to "${portal}"} & {device_servergroup_name equal_to "${product}"} & {device_servergroup_name equal_to "${hostType}"} & {device_servergroup_name equal_to "${environment}"}

Filter String

OCLI command

HostsSSH: run commands in each host in the list

Page 32: system automation, integration and recovery

35

Build Deployment Automation

Page 33: system automation, integration and recovery

36

Build Deployment AutomationBuild Deployment Automation

Goal: Develop an OO workflow to build RPM and deploy it to target servers

Workflow key features:– Identify target servers, software policy and RPM in HP SA– Build RPM and upload it to HP SA– Stop/start applications in target servers– Detach/attach software policies and remediate target

servers– Update RPM in software policies

Page 34: system automation, integration and recovery

37

Build Deployment AutomationBuild Deployment Automation

Workflow inputs:– Portal– Product– Host Type– Application– Environment– Build Version

Page 35: system automation, integration and recovery

38

Build Deployment AutomationBuild Deployment Automation

Identify target servers– Setup server groups in HP SA: portal groups, product

groups, host type groups, and environment groups; then assign servers to appropriate groups

Host type group

Product group

Portal group

Environment group

Page 36: system automation, integration and recovery

39

Build Deployment AutomationBuild Deployment Automation

Identify target servers (Cont.)– Use OO SSH operation to execute OCLI command to get

SAS server list• OCLI: findServerRefs and getServerVO in server service• Filter: Use aforementioned server groups as filter

{device_servergroup_name equal_to "${portal}"} & {device_servergroup_name equal_to "${product}"} & {device_servergroup_name equal_to "${hostType}"} & {device_servergroup_name equal_to "${environment}"}

Filter String

for i in `/opsw/api/com/opsware/server/ServerService/method/.findServerRefs:i filter='${filterString}'`;

do /opsw/api/com/opsware/server/ServerService/method/getServerVO self:i="$i";

done

OCLI command

Page 37: system automation, integration and recovery

40

Build Deployment AutomationBuild Deployment Automation

Identify software policy & RPM– Software Policy naming in HP SA:

{Application} – {Environment}– Use findSoftwarePolicyRefs OCLI command to identify

software policy– Use findRPMRefs OCLI command to identify RPM

Page 38: system automation, integration and recovery

41

Build Deployment AutomationBuild Deployment Automation

Build RPM and upload it to HP SA– Required parameters: application and build version– A Perl application on Apache to build RPM– Client sends HTTP request with parameters to trigger the

Perl application– Upload the RPM to HP SA with OCLI 1.0– Get the result back to the client

Page 39: system automation, integration and recovery

42

Build Deployment AutomationBuild Deployment Automation

Stop/start applications in target servers– Use “HostsSSH: run Linux commands in a list of hosts”

utility workflow to run stop/start command on target hosts

Detach/attach software policies and remediate target servers– Use OO out-of-box operations

Update RPM in software policies– Use OCLI update command in software policy service to

replace RPM in target software policy

Page 40: system automation, integration and recovery

43

Build Deployment AutomationBuild Deployment Automation

Put it all together!

Build and upload RPM

Identify SP, RPM, and target servers

Start/stop application in target servers

Detach/attach SP, replace RPM in SP, and Remediate

Page 41: system automation, integration and recovery

53

Unattended WebMD Content Backup

Page 42: system automation, integration and recovery

54

Unattended WebMD Content BackupUnattended WebMD Content Backup

Goal: Develop two OO workflows: 1. shutdown all components and backup WebMD contents. 2. bring all components up

Workflow key features:– Identify target servers– Windows: stop/start windows svc and IIS sites– Linux: stop/start applications and run any script if needed– Send error/success email notices– Utilize OO scheduler to trigger cold backup– The workflow needs to setup another schedule to trigger

another flow to bring up all components

Page 43: system automation, integration and recovery

55

Unattended WebMD Content BackupUnattended WebMD Content Backup

Workflows OverviewFlow 1:

1. Shut down all components2. Run file back up3. Run DB backup4. Schedule another flow (flow 2) to start all components

Flow 2:1. Check backup status2. Start all components

60 min

Page 44: system automation, integration and recovery

62

Maintenance Free System

Page 45: system automation, integration and recovery

63

Maintenance Free SystemMaintenance Free System

Goal: Proactively maintain the health of our applications without shutting them down

Workflow key features:– Automatically clear cache and stale data without

shutting down or restarting applications– Purge outdated publishing data and logs– Ensures that the most relevant information is retained.– Improves both system-level and publishing performance.– Minimize the need for frivolous restarts. – Keep our applications online longer

Page 46: system automation, integration and recovery

64

Maintenance Free SystemMaintenance Free System

Workflow details– Single SSH Node– Runs a script to purge data/log files older than 3 days– Runs on OO scheduler once a day

Page 47: system automation, integration and recovery

65

Results from HP SA/OO Implementation

Page 48: system automation, integration and recovery

66

Better Life in cmsops - 1Better Life in cmsops - 1

Sampling duration: Oct 11,2007 – Jul 24, 2009653 days/426 working daysSource: customized teamtrack reports and emailsSummary1772 teamtrack requests479 email requests*5.3 tickets/working day

Sampling duration: Jul 25,2009 – Dec 10, 2009135 days/93 working daysSource: customized teamtrack reports and emailsSummary:248 teamtrack requests35 email requests (reduced by 35%)3.1 tickets/working day285 cmsai request (self-service)

Page 49: system automation, integration and recovery

67

Better Life in cmsops - 2Better Life in cmsops - 2

Non-prod environments are self-serviceable 15% of build deployment is automatedAutomatic/Scheduled data/log purging Scheduled/unattended cold backup*

Page 50: system automation, integration and recovery

68

Q/A

Page 51: system automation, integration and recovery

69 ©2010 Hewlett-Packard Development Company, L.P.

To learn more on this topic, and to connect with your peers after the conference, visit the HP Software

Solutions Community:www.hp.com/go/swcommunity

Page 52: system automation, integration and recovery

70