© blackboard, inc. all rights reserved. disaster recovery planning: sharing experiences from...

20
© Blackboard, Inc. All rights reserved. Disaster Recovery Planning: Sharing Experiences from Blackboard ASP Services Harry Choi – Director, ASP Services Jonas Hirshfield – Director, ASP Infrastructure Operations

Upload: curtis-joshua-griffin

Post on 23-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

© Blackboard, Inc. All rights reserved.

Disaster Recovery Planning:Sharing Experiences from Blackboard ASP Services

Harry Choi – Director, ASP Services

Jonas Hirshfield – Director, ASP Infrastructure Operations

2

About Forward-Looking Statements

» Any statements in this presentation about future expectations, plans and prospects for Blackboard and other statements containing the words “believes,” “anticipates,” “plans,” “expects,” “will,” and similar expressions, constitute forward-looking statements within the meaning of The Private Securities Litigation Reform Act of 1995. Actual results may differ materially from those indicated by such forward-looking statements as a result of product development changes and other important factors discussed in our filings with the SEC. We may make statements regarding our product development and service offering initiatives, including the content of future product upgrades, updates or functionality in development.  While such statements represent our current intentions, they may be modified, delayed or abandoned without prior notice and there is no assurance that such offering, upgrades, updates or functionality will become available unless and until they have been made generally available to our customers.

3

Agenda:

» Blackboard ASP Services Introduction» MTTR Planning vs. Disaster Recovery» Planning for Disaster Recovery:

» Defining the Definition» Setting up the Requirements» Planning & Testing

4

Blackboard ASP Services Introduction» Over 450+ Customers from All Market Sectors

» 4 Datacenters – 2 in Northern Virginia; 1 in Vancouver; 1 in Amsterdam

» Over 4 Million Users in our Systems – Over 3 Million Active Users

» 1500+ Servers, 250 TB Storage Capacity & Growing

» 7 Terabytes of Data Transferred Daily

» 100 Million + HTTP Requests Served Per Day at Peak Times

» Blackboard Academic SuiteTM, Blackboard Commerce SuiteTM, Email Hosting

» 99.7% or Better Uptime Guarantee

6

Fully Managed Service: Backed by Team of Experts

7

What is “Disaster Recovery”?

» “Ability to recover from the loss of a complete site, whether due to a natural disaster or malicious intent.”

» “A plan of action to recover from an unlikely event of a severe or catastrophic business disruption.”

» It’s NOT a planning for Mean-Time-To-Recovery (MTTR) from daily operational risks.

8

MTTR Choices: Recovery-Oriented Scalable Infrastructure

» On-Demand, Redundant Scaling Technologies» Use of attached clustered storage allows quick use of client growth» 2N Redundancy at Core Infrastructure Level» Burstable, Redundant Internet Connectivity» Load Balancing» Dual Core CPU, Caching, Hyper-threading» Clustering Technology

» Recovery Oriented Choices:» Autonomic Capabilities – Datacenter / Network / Systems» Warm Standby (rack, stacked and powered up)» SnapMirror Technologies» Oracle DataGuard & RMAN Technologies» Cold Standby » “Platinum” Service Contracts from Service Providers and Vendors» N+1 Redundancy capabilities at client level

It’s All About Tolerance for MTTR vs. Costs

9

Bb ASP Network Highly available network design

• No single point of failure at Infrastructure layer

• Quarterly Infrastructure Testing

• Maintenance with minimal downtime

Enterprise class hardware devices

• Clustered Network Appliance Storage

• Foundry Routers and Switches

• Juniper Firewall/VPN

• Tipping Point IPS

• NetScaler Load Balancers

• Infloblox hardened DNS Appliances

Optional Security Services

• VPNs for secure data transfer

• SSL for encrypted access

10

Levels of Data Backups» 1st level backup:

using snapshot technology, file systems & database backed up on Network Attached Storage systems

» 2nd level backup:using NetApp NearStore devices, 1st level backups are stored online, off-site for 30 days

» 3rd level backup:weekly backups are stored in tape, off-site for 30 days

11

Monitoring Tools for Proactive Management» Key Performance Metrics

polled every 5 minutes» cpu = 5-minute load

average» disk = % free on local mount

points» nfs = % free on NFS mount

points» msgs = scanning syslog for

key words» memory = real, virtual and

swap» procs = scanning ‘ps’ output» Obak = Oracle backup

status» ofiles = Oracle data/index

file sizes» files = other file/directory

sizes» svcs = Windows Services

checking» Application Check for

Response Time

12

Disaster Recovery Requirements

» Set the institution’s definition of “disaster”» Driven by Business Impact» Priority of Mission Critical Applications

» Define Requirements» Set Threshold for Recovery

13

Questions to Consider:

» What is the threshold on recovery time (RTO) and recovered data (RPO)?

» What is the objective during disaster recovery period:» Minimum Basics function – i.e. online materials

availability and course continuation?» Full Production Availability, including LDAP,

Customizations/Building Blocks availability?

» What is the plan for post-DR?

14

ASP Business Continuity Service Offering:» Recovery Time Objective (RTO)

» RTO is the time-measured objective to have the Blackboard Business Continuity Service operation up and running from the point in time that Blackboard is made aware of the client’s primary Blackboard applications system failure.

» Recovery Point Objective (RPO)» RPO is the objective to minimize the loss of the client’s database

and file storage content by constantly backing up the client’s information no less than the time guaranteed under each service level.

» Customizations & Configuration» Dependent on Client’s requirement and RTO & RPO objectives

15

Business Continuity Service Levels

Recovery Time Objective (RTO)

Recovery Point Objective (RPO)

Customization & Configuration

Locally hosted client:

Weekly backup service 24 hours Last backup less than 8 days

Fully backed up(if possible)

Monthly backup service 72 hours Last backup less than 32 days

Rebuild

Incident-based service 120 hours Latest backup received

N/A

Blackboard ASP hosted client:

Level 3 Service 6 hours Last backup less than 3 hours

Fully backed up

Level 2 Service 24 hours Last backup less than 12 hours

Fully backed up

Level 1 Service 72 hours Last backup less than 24 hours

Rebuild

16

Business Continuity Service Offering:

» Testing & Collaboration During Setup» Customized Service» Building Blocks, LDAP, & other considerations

» Simulation Tests Once a Year» Database Server(s) is dedicated & App

Servers are Reserved» In DR situation, Service up for 30 days» Client Audit of Datacenter & Operation

Welcome

17

Disaster Recovery Option

Private Zone Public Zone

Blackboard ASP VA1 Facility

Private Zone Public Zone

Blackboard ASP VA2 Facility

Optical Fiber Ring

18

Disaster Recovery – Client Options

» Client Application Content» Use of Network Appliance’s Snapmirror technology to

mirror client application data in real time across facilities.» Dedicated standby hardware

» Client Oracle Content» Traditional Oracle Exports. Long RTO & RPO» Oracle RMAN Technology Long to Medium RTO & RPO

» Warm dedicated Standby Server» Oracle DataGuard – Best RTO & RPO

» Hot dedicated standby server

19

The Human Factor:» Rigorous & Regular Training» Redundancy in Skill Sets

» Plan for loss of critical staff in a DR event.» Change Management Control Tools – Central Authentication

System, Automated Scripts, Documentation, etc.» Readiness Tests – e.g. Fall Preparation Readiness Testing

» Perform routine testing to ensure technology is working as expected» Documentation

» Disaster recovery procedures should be well documented. » Plan for the unexpected.

» Loss of critical staff» No Physical Access to the facility» Loss of traditional internet access to the facility

» Install POTS line with serial connections to infrastructure

20

Questions? Comments?