disaster recovery and business continuity planning in a university environment mardecia bell ann...
Post on 20-Dec-2015
216 views
TRANSCRIPT
Disaster Recovery and Business Continuity Planningin a University Environment
Mardecia BellAnn Harris
Copyright Mardecia Bell/Ann Harris 2005. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.
The realization of a single point of failure with one data center for both the central academic and administrative IT environments, prompted NC State University to implement a disaster recovery strategy for communications and critical applications residing on the mainframe & open systems computing environment.
History/Timeline
1997 Initiated with the administrative environmentMainframe environment recovery test
1999 Y2K - Business Continuity concept Acquired central repository software (LDRPS)
2001 Scheduled annual Mainframe recovery testIncluded communications & academic environment
2002 Expanded to include Enterprise Business Continuity/Disaster Recovery Planning
2004 Successful DR test of ERP systems
2005 Co-processing of production services began in Data Center II
Implementation Steps
• Gain Sponsorship• Establish Steering Committees• Develop University Policy/Regulation• Create DR Structure/Establish Staffing• Market Program• Establish Central Repository• Review & Test Plans Regularly
Gain Sponsorship
• Office of the President – University System• Chancellor • Executive Management
– Present your Business Case– Identify the roles involved– Provide Executive Summary of BC/DR Program– Present Statement of Work and Project Plan
• Add responsibilities to staff work plans
Establish Steering Committees
• IT Steering Committee• Business/Service Steering Committee• Both committees are comprised of
– Vice Chancellor/Vice Provost Level– Representatives from Critical Areas of the Campus– Ex Officio members from IT areas
• Mission of IT Steering Committee– Provide guidance and oversight for the
combined academic and administrative Disaster Recovery Plan.
Policy/Regulations/Rule
• Develop a Policy or Regulation to affirm the mandate and promote cooperation
Divide Campus Into Groupings• Space/Facilities • Teaching and Academic Programs• Academic IT • Administrative IT • Environmental Health and Public Safety • Business Administration • Research Programs • Student Affairs• Extension and Engagement
Resource Projections
• Hire Full-Time Business Continuity and Disaster Recovery Personnel– Director of Business Continuity (plus 1 Business
Analyst)– Admin IT DR Coordinator (plus 1 Business Analyst)– Academic DR Coordinator (part-time)
• Add BC/DR responsibilities to work plan of existing staff
• Identify Coordinators for each business unit
Marketing
• Present at campus departmental meetings• Create a Website• Utilize listserves• Campus Newspaper• Network with peer institutions• Remain abreast of industry standards• Attend conferences, workshops and
seminars
Accomplishments
• Disaster Recovery and Business Continuity Plan
• Risk Assessments for Critical Business Units• Successful Mainframe Recovery Tests• Designed and implemented infrastructure for
central computing environment (academic & administrative) in secondary data center.
• Implementation of recovery strategies in secondary data center
• Creation of Administrative IT Disaster Recovery Unit
Illustration of Various DR Deployments
Fault-tolerant cluster (file and print services)
A ProductionB Configuration
B ProductionA Configuration
B Production
A Production
Distributed deployment (hosted systems)
A Production A Development A Production
Co-processing and load-balancing (ERP)
A ProductionA Production A Production
Data replication (mainframe)
Server Data Server Data Server Data
Enterprise Resource Planning (ERP) Deployment
DC II
Financial System Human Resources (Version 8.8) Student Information System (under construction)
DC I
Web Server
DB Server
Application Server
Batch Server
CampusUsers
Web Server
Application Server
Batch Server
Web Server
Application Server
Web Server
Application Server
Batch Server
DB Server
Batch Server
Data
Storage Area
Network
Summary and Future Steps
DC II
Hosted systems
Infrastructure
DataData
Storage Area
Network
Active Directory/ Windows
Novell DirectoryServices / Novell
Citrix
ERP Web
ERP Batch
ERP Application
Data
Backup/vaulting
ERP DB Server
DC I
Hosted systems
Infrastructure
DataData
Data
Storage Area
Network
Backup/vaulting
Active Directory/ Windows
Novell DirectoryServices / Novell
Citrix
ERP Web
ERP Batch
ERP DB Server
ERP Application
Development Server
Mainframe Server
Email/Calendar Anti-SPAM
File/Print, User Home
Web Server
Database Server Development
ServerMainframe ServerWeb
Server
Database Server
DataData
Storage Area
Network
Data
Email/Calendar Anti-SPAM
File/Print, User Home
Administrative IT Disaster Recovery Unit
Mission
• Ensure minimal risk of major disruptions to critical University systems and processes in the event that all or part of its computer operations are rendered inoperable.
• Ensure timely recovery of infrastructure and services in the event of a disruption.
• Ensure that business continuity plans are available and viable relative to its scenario.
Risk ManagementRisk Mitigation
• Prioritize Actions• Evaluate recommended
Control Options• Conduct Cost-Benefit
Analysis• Select Controls• Assign Responsibility• Develop Safeguard
Implementation Plan• Implement Selected Controls
Risk Assessment
• System Characterization• Threat Identification• Vulnerability Identification• Control Analysis• Likelihood Determination• Impact Analysis• Risk Determination• Control Recommendations• Results Documentation
NIST SP 800-30
Financials WebLogic
v8.1FinancialsWebLogic
v8.1
iPlanet v6.0Proxy Server
authentication
FinTrainWebLogic
v8.1FinTrain
WebLogic v8.1
iPlanet v6.0Proxy Server
authentication
SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite
Web Server
Web Server 2
SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite Web Server
5
SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite
SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite
Web Server 1
Web Server 4
SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite
Web Server 6
SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite
SunFire v240 (2-cpu, 8GB)Veritas Foundation Suite
Web Server 3
FinRepWebLogic
v8.1Fin Rep
WebLogic v8.1
iPlanet v6.0Proxy Server
authentication
Distributed Web Servers
Application Server 1SunFire v1280 (12-cpu, 24GB)
Veritas Foundation Suite
Distributed Application Transaction Servers
(AppServers)
FinancialsAppServer
Tuxedo v6.5
Application Server 3SunFire v480 (4-cpu, 8GB)Veritas Foundation Suite
FinRepAppServer
Tuxedo v6.5
Application Server 2SunFire v1280 (12-cpu, 24GB)
Veritas Foundation Suite
Application Server 4SunFire v480 (4-cpu, 8GB)Veritas Foundation Suite
FinTrainAppServer
Tuxedo v6.5Application ServerSunFire v240 (2-cpu, 4GB)Veritas Foundation Suite
Sun
ENTERPRISE45
0
Ultra
HRAppServer
Tuxedo
HRRepAppServer
Tuxedo
HRTrainAppServer
Tuxedo
Application ServerSun E450Solaris 7
(4 SparcII cpu, 4GB)Veritas Foundation Suite
Distributed Process Schedulers (Batch Servers)
FinancialsBatchServerTuxedo v6.5
FinRepBatchServerTuxedo v6.5
FinTrainBatchServerTuxedo v6.5
Batch Server 1SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite
Batch Server 4SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite
Batch Server 5SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite
Batch Server 2SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite
Batch Server 3SunFire v240 (2-cpu, 4GB)Veritas Foundation Suite
SunENTERPRISE
10000
HRTuxedoProcess
Scheduler
Batch ServerSun E10K OS Domain
Solaris 7(8 SparcII cpu, 8GB)
Veritas Foundation Suite
Data Server 1SunFire E25K OS Domain (12 SparcIV cpu, 96GB)
Veritas DBE Oracle w/ FlashSnap
FinancialsOLTP
Oracle 9i
FinRepReportingOracle 9i
FinTrainTrainingOracle 9i
SunENTERPRIS
E 10000
Data Server 2Sun E10K OS Domain
Solaris 8 (12 SparcII cpu, 12GB)Veritas DBE Sybase w/
FlashSnap
HRSybase ASE
12.0.0.6
HRRepSybase ASE
12.0.0.6
HRTrainSybase ASE
12.0.0.6
Users Web/Application
Clients
Databases
Process Mapping
Infrastructure
• Total DR through distributed high availability
• Client Recovery Solutions• Application Restoration• Establish collaborative partnerships with
other Universities
Client Recovery Solution(s)
Critical Business Units• Advancement Services
• All Campus Network
• Budget Office
• College of Agriculture and Life Sciences - Personnel Office
• ComTech - Data Networking
• ComTech - Telecommunications
• Contracts and Grants
• Controller's Office
• Enterprise Application and Database Services
• EH&S - Business Continuity
• EH&S - Campus Police
• EH&S - Emergency Response
• EH&S - Environmental Affairs
• EH&S - Health and Safety
• EH&S - Industrial Hygiene
• EH&S - Insurance and Risk Management
• EH&S - Radiation Safety
• EH&S - Transportation
• EH&S - Waste Management• Enrollment Management - Admissions • Enrollment Management - Office of Scholarships & Financial Aid• Enrollment Management - Registration and Records
• Enterprise Technology Services and Support• Facilities - Construction Management• Facilities - Design and Construction Services• Facilities - Operations • Facilities - University Architect
• Fire Protection• Foundations Accounting & Investments• HR - Benefits • HR - Employment & Compensation• HR - Human Resource Information Management• HR - Payroll• ITD - Business Services• ITD - Computer Operations• ITD - Computer Services• ITD - Systems• Libraries - Administration• Materials Management - Materials Support• Materials Management - Purchasing• Materials Management - University Graphics• Real Estate• Student Health Services• University Cashier's Office• University Dining• University Housing
Communication
• Consistency in plan updating• Training• Partnering• Emergency Communication standardization
– Call Trees– Mobile Devices– Website– Incident Command System Call Center– Incident Report Plan
IT Disaster Categorization
• Category 1: A single person or group in a Critical Business Unit (CBU) is unable to perform their critical functions
• Category 2: An entire CBU is unable to perform its critical functions
• Category 3: Multiple CBUs are unable to perform their critical functions
• Category 4: Non CBUs are not able to perform their critical functions
• Category 5: A wide spread event that impacts the entire University
Goals
• Total DR through distributed high availability
• Standardized Emergency Communications
• Immediate Client Recovery Solutions• Improved RTO
Ann Harris
Asst Dir, Administrative IT Disaster Recovery
919-515-9228
http://www.fis.ncsu.edu/dr