disaster recovery and business continuity planning in a university environment mardecia bell ann...

31
Disaster Recovery and Business Continuity Planning in a University Environment Mardecia Bell Ann Harris Copyright Mardecia Bell/Ann Harris 2005. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Disaster Recovery and Business Continuity Planningin a University Environment

Mardecia BellAnn Harris

Copyright Mardecia Bell/Ann Harris 2005. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.

The realization of a single point of failure with one data center for both the central academic and administrative IT environments, prompted NC State University to implement a disaster recovery strategy for communications and critical applications residing on the mainframe & open systems computing environment.

History/Timeline

1997 Initiated with the administrative environmentMainframe environment recovery test

1999 Y2K - Business Continuity concept Acquired central repository software (LDRPS)

2001 Scheduled annual Mainframe recovery testIncluded communications & academic environment

2002 Expanded to include Enterprise Business Continuity/Disaster Recovery Planning

2004 Successful DR test of ERP systems

2005 Co-processing of production services began in Data Center II

Implementation Steps

• Gain Sponsorship• Establish Steering Committees• Develop University Policy/Regulation• Create DR Structure/Establish Staffing• Market Program• Establish Central Repository• Review & Test Plans Regularly

Gain Sponsorship

• Office of the President – University System• Chancellor • Executive Management

– Present your Business Case– Identify the roles involved– Provide Executive Summary of BC/DR Program– Present Statement of Work and Project Plan

• Add responsibilities to staff work plans

Establish Steering Committees

• IT Steering Committee• Business/Service Steering Committee• Both committees are comprised of

– Vice Chancellor/Vice Provost Level– Representatives from Critical Areas of the Campus– Ex Officio members from IT areas

• Mission of IT Steering Committee– Provide guidance and oversight for the

combined academic and administrative Disaster Recovery Plan.

Policy/Regulations/Rule

• Develop a Policy or Regulation to affirm the mandate and promote cooperation

Divide Campus Into Groupings• Space/Facilities • Teaching and Academic Programs• Academic IT • Administrative IT • Environmental Health and Public Safety • Business Administration • Research Programs • Student Affairs• Extension and Engagement

Resource Projections

• Hire Full-Time Business Continuity and Disaster Recovery Personnel– Director of Business Continuity (plus 1 Business

Analyst)– Admin IT DR Coordinator (plus 1 Business Analyst)– Academic DR Coordinator (part-time)

• Add BC/DR responsibilities to work plan of existing staff

• Identify Coordinators for each business unit

Marketing

• Present at campus departmental meetings• Create a Website• Utilize listserves• Campus Newspaper• Network with peer institutions• Remain abreast of industry standards• Attend conferences, workshops and

seminars

Establish Central Information Repository

Continuous Implementation

Accomplishments

• Disaster Recovery and Business Continuity Plan

• Risk Assessments for Critical Business Units• Successful Mainframe Recovery Tests• Designed and implemented infrastructure for

central computing environment (academic & administrative) in secondary data center.

• Implementation of recovery strategies in secondary data center

• Creation of Administrative IT Disaster Recovery Unit

Illustration of Various DR Deployments

Fault-tolerant cluster (file and print services)

A ProductionB Configuration

B ProductionA Configuration

B Production

A Production

Distributed deployment (hosted systems)

A Production A Development A Production

Co-processing and load-balancing (ERP)

A ProductionA Production A Production

Data replication (mainframe)

Server Data Server Data Server Data

Enterprise Resource Planning (ERP) Deployment

DC II

Financial System Human Resources (Version 8.8) Student Information System (under construction)

DC I

Web Server

DB Server

Application Server

Batch Server

CampusUsers

Web Server

Application Server

Batch Server

Web Server

Application Server

Web Server

Application Server

Batch Server

DB Server

Batch Server

Data

Storage Area

Network

Summary and Future Steps

DC II

Hosted systems

Infrastructure

DataData

Storage Area

Network

Active Directory/ Windows

Novell DirectoryServices / Novell

Citrix

ERP Web

ERP Batch

ERP Application

Data

Backup/vaulting

ERP DB Server

DC I

Hosted systems

Infrastructure

DataData

Data

Storage Area

Network

Backup/vaulting

Active Directory/ Windows

Novell DirectoryServices / Novell

Citrix

ERP Web

ERP Batch

ERP DB Server

ERP Application

Development Server

Mainframe Server

Email/Calendar Anti-SPAM

File/Print, User Home

Web Server

Database Server Development

ServerMainframe ServerWeb

Server

Database Server

DataData

Storage Area

Network

Data

Email/Calendar Anti-SPAM

File/Print, User Home

Administrative IT Disaster Recovery Unit

Mission

• Ensure minimal risk of major disruptions to critical University systems and processes in the event that all or part of its computer operations are rendered inoperable.

• Ensure timely recovery of infrastructure and services in the event of a disruption.

• Ensure that business continuity plans are available and viable relative to its scenario.

Risk Management

• Identify• Mitigate• Process Mapping

Risk ManagementRisk Mitigation

• Prioritize Actions• Evaluate recommended

Control Options• Conduct Cost-Benefit

Analysis• Select Controls• Assign Responsibility• Develop Safeguard

Implementation Plan• Implement Selected Controls

Risk Assessment

• System Characterization• Threat Identification• Vulnerability Identification• Control Analysis• Likelihood Determination• Impact Analysis• Risk Determination• Control Recommendations• Results Documentation

NIST SP 800-30

Financials WebLogic

v8.1FinancialsWebLogic

v8.1

iPlanet v6.0Proxy Server

authentication

FinTrainWebLogic

v8.1FinTrain

WebLogic v8.1

iPlanet v6.0Proxy Server

authentication

SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite

Web Server

Web Server 2

SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite Web Server

5

SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite

SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite

Web Server 1

Web Server 4

SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite

Web Server 6

SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite

SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite

Web Server 3

FinRepWebLogic

v8.1Fin Rep

WebLogic v8.1

iPlanet v6.0Proxy Server

authentication

Distributed Web Servers

Application Server 1SunFire v1280 (12-cpu, 24GB)

Veritas Foundation Suite

Distributed Application Transaction Servers

(AppServers)

FinancialsAppServer

Tuxedo v6.5

Application Server 3SunFire v480 (4-cpu, 8GB)Veritas Foundation Suite

FinRepAppServer

Tuxedo v6.5

Application Server 2SunFire v1280 (12-cpu, 24GB)

Veritas Foundation Suite

Application Server 4SunFire v480 (4-cpu, 8GB)Veritas Foundation Suite

FinTrainAppServer

Tuxedo v6.5Application ServerSunFire v240 (2-cpu, 4GB)Veritas Foundation Suite

Sun

ENTERPRISE45

0

Ultra

HRAppServer

Tuxedo

HRRepAppServer

Tuxedo

HRTrainAppServer

Tuxedo

Application ServerSun E450Solaris 7

(4 SparcII cpu, 4GB)Veritas Foundation Suite

Distributed Process Schedulers (Batch Servers)

FinancialsBatchServerTuxedo v6.5

FinRepBatchServerTuxedo v6.5

FinTrainBatchServerTuxedo v6.5

Batch Server 1SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite

Batch Server 4SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite

Batch Server 5SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite

Batch Server 2SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite

Batch Server 3SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite

SunENTERPRISE

10000

HRTuxedoProcess

Scheduler

Batch ServerSun E10K OS Domain

Solaris 7(8 SparcII cpu, 8GB)

Veritas Foundation Suite

Data Server 1SunFire E25K OS Domain (12 SparcIV cpu, 96GB)

Veritas DBE Oracle w/ FlashSnap

FinancialsOLTP

Oracle 9i

FinRepReportingOracle 9i

FinTrainTrainingOracle 9i

SunENTERPRIS

E 10000

Data Server 2Sun E10K OS Domain

Solaris 8 (12 SparcII cpu, 12GB)Veritas DBE Sybase w/

FlashSnap

HRSybase ASE

12.0.0.6

HRRepSybase ASE

12.0.0.6

HRTrainSybase ASE

12.0.0.6

Users Web/Application

Clients

Databases

Process Mapping

Infrastructure

• Total DR through distributed high availability

• Client Recovery Solutions• Application Restoration• Establish collaborative partnerships with

other Universities

Application Restoration

• Event• Time• Scope of Impact

– Infrastructure– Software– Hardware

Collaborative Partnerships

• Readily accessible

• Secure

• Onsite

• Offsite

Vaulting

Critical Business Units• Advancement Services

• All Campus Network

• Budget Office

• College of Agriculture and Life Sciences - Personnel Office

• ComTech - Data Networking

• ComTech - Telecommunications

• Contracts and Grants

• Controller's Office

• Enterprise Application and Database Services

• EH&S - Business Continuity

• EH&S - Campus Police

• EH&S - Emergency Response

• EH&S - Environmental Affairs

• EH&S - Health and Safety

• EH&S - Industrial Hygiene

• EH&S - Insurance and Risk Management

• EH&S - Radiation Safety

• EH&S - Transportation

• EH&S - Waste Management• Enrollment Management - Admissions • Enrollment Management - Office of Scholarships & Financial Aid• Enrollment Management - Registration and Records

• Enterprise Technology Services and Support• Facilities - Construction Management• Facilities - Design and Construction Services• Facilities - Operations • Facilities - University Architect

• Fire Protection• Foundations Accounting & Investments• HR - Benefits • HR - Employment & Compensation• HR - Human Resource Information Management• HR - Payroll• ITD - Business Services• ITD - Computer Operations• ITD - Computer Services• ITD - Systems• Libraries - Administration• Materials Management - Materials Support• Materials Management - Purchasing• Materials Management - University Graphics• Real Estate• Student Health Services• University Cashier's Office• University Dining• University Housing

Business Continuity Planning

Communication

• Consistency in plan updating• Training• Partnering• Emergency Communication standardization

– Call Trees– Mobile Devices– Website– Incident Command System Call Center– Incident Report Plan

IT Disaster Categorization

• Category 1: A single person or group in a Critical Business Unit (CBU) is unable to perform their critical functions

• Category 2: An entire CBU is unable to perform its critical functions

• Category 3: Multiple CBUs are unable to perform their critical functions

• Category 4: Non CBUs are not able to perform their critical functions

• Category 5: A wide spread event that impacts the entire University

Goals

• Total DR through distributed high availability

• Standardized Emergency Communications

• Immediate Client Recovery Solutions• Improved RTO

Ann Harris

Asst Dir, Administrative IT Disaster Recovery

919-515-9228

[email protected]

http://www.fis.ncsu.edu/dr