apdip disaster mgmt
DESCRIPTION
TRANSCRIPT
Disaster Management(i.e. Business Continuity) Josef C. MuellerAssociate Partner
Objective
To discuss information systems disaster
management and the formation of backup and disaster recovery plans
Agenda
• Introduction
• Disaster Recovery Approach
• DR Team Organization
• Case Study
• Example Disaster Recovery Services
• Open discussion
What is a Disaster?
Any unplanned event that requires immediate redeployment of limited resources
Any unplanned event that requires immediate redeployment of limited resources
Natural Forces Fire Environmental
Hazards Flood / Water
Damage Extreme Weather
Technical Failure Power Outage Equipment Failure Network Failure Software Failure
Human Interference Criminal Act Human Error Loss of Users Explosions
Sample Disasters
Introduction
Some Examples of Disasters
The Chicago FloodThe underground flood of Chicago on Monday April 13, 1992 proved to be one of the worst business disasters ever. 230 buildings lost power because water threatened their underground power sources.
The World Trade Center ExplosionBusinesses were forced to evacuate the World Trade Center in
February 26, 1993. When a bomb exploded in the underground parking garage. Companies that were effected by the disruption were unable to remove critical equipment and documents.
The San Francisco Earthquake The Oct 18, 1989 quake measured 7.0 on the Richter Scale. The
Bay bridge had collapsed. The city had lost the main business section due to the collapse of buildings and electricity.
Introduction
Some Examples of Disasters (Cont’d)
Hurricane AndrewAugust 22, 1992, Hurricane Andrew hit the South Florida area. Many businesses suffered physical and financial losses from the hurricane, the valuation of destroyed property was the largest in US history.
The Kobe QuakeThe devastation on January 17, 1995 was the worst in the port city of
Kobe where the 7.2 magnitude quake toppled roadways, wrecked docks, severed communication lines and kept the city in flames into the next day.
Oklahoma City BombingOn April 19, 1995, a terrorist bomb exploded in front of the nine-story
Alfred P. Murrah Federal Building in downtown Oklahoma City. The blast destroyed one-third of the building from roof to ground, leaving a crater eight feet deep, and 30 feet wide.
Introduction
What is a Disaster Recovery Plan?
A management document for how and when to utilize resources needed to maintain selected functions
when disrupted by agreed upon incidents
A management document for how and when to utilize resources needed to maintain selected functions
when disrupted by agreed upon incidents
Business Continuity Plan Contingency Plans Continuity Plans Emergency Response Plans Business Recovery Plans Recovery Plans
Other names commonly used:
Introduction
AssessDamage
RestorePrimary
Site
PrepareNew Site
ConfirmResponseStrategy
ExecuteRequiredFunctions
Transfer &Execute atNew Site
Transfer toAlternateLocation
Incident
Return to Normal Operations
Transfer &Execute at
Primary Site
GenerateChange
Requests
Assess DRPEffectiveness
When an incident occurs, the Disaster Recovery response activities are likely to be
the following (at a high level).
Introduction
Regional Area Local Area Within 3 Blocks To The Building Within 3 Floors On The Floor Within The Room
What is the magnitude of an incident?
Depending upon the magnitude of an incident, possible alternative sites include:
Introduction
Within The Room Within the Building Within the Region Outside the Region
Integrity Controls Policy Methodology Staffing Education Division of
Responsibility Audit Error and Change
Control Reporting and
Resolution Test Quality Assurance
Confidentiality Controls Proprietary Information
Policy Ethics Statement “Need to Know”, “Need
to Withhold” Classification Scheme Records Management Handling Procedures Physical & Electronic Security Measures
Availability Controls Asset Identification Interruption Analysis Controls Review Impact Analysis Data Backup Off-site Storage Avoidance Strategies Mitigation Strategies Early Detection &
Notification Recovery Strategies Alternate Locations Plans and Procedures Vendor Relationships Training Testing
Types of Controls
Introduction
Avoidance Strategy Redundant
configuration to avoid incidents
Site harden facilities to resist incidents
Redundant utilities and hardware
Automated operation recovery plan
Mitigation Strategy Early warning
detection Contractual
agreements with vendors
Mirrored data and documents
Detailed migration recovery plan
Recovery Strategy High level recovery
plan Off-site data storage Very responsive
vendor relationships Very knowledgeable
employees
Types of Strategy Options Hot site Cold site Self Backup Service Bureau Reciprocal Agreement
Introduction
Types of Strategies
What is a Critical Business Function?
A specific entity management has decided is so significant to the business mission, that without it, the organization cannot
successfully operate after an identified time period.
A specific entity management has decided is so significant to the business mission, that without it, the organization cannot
successfully operate after an identified time period.
Financial Loss Lost Revenue Lost Sales Lost Market Share Lost Opportunity
Extra Expense Labor Cost
—Recreate Lost Business
—Recreate Lost Data
—Use Manual Process
Equipment Cost—Hardware /
software—Telephones
Money Cost—Delayed
Receivable—Delayed Orders—New Interest—New Investments
Human Interference Management Control Employee Relations Stockholder Relations Public Image Legal Exposure Contractual Liability Competitive
Advantage
Types of Impact
Introduction
Timing Requirements Minutes Hours Days Weeks Quarters Special Situations
Interdependencies Inputs and Outputs
Criteria for a Critical Business Function
Cost of Impact $
Impact
Cost
Cost of Control $
Cost of Control vs. Impact
Introduction
Implementing Recovery Plans is not an easy task!
Recovery prevention techniques are inadequate Increase the level of user security awareness and education No recovery plan at all Plan is stored on the “ultimate” computer (in IT directors’ head) Establish short-term alternate processing procedures Removal of systems running on obsolete machines Recovery plans are too theoretical and not geared to the
organization’s needs Plans are unwieldy Recovery plans are in a written format and/or are not updated Backup not tested Plans not tested Plans are located in the computer room or the building Plans are too grandiose (EXPENSIVE) Plan does not address PCs / workstations “People Factors” are not taken into account
Introduction
Planning Activities
Maintenance Activities
Changes
Changesfrom tests
Up-to-Date DRP
Recovery Activities Changesfromevent
NormalOperations
The following Life Cycle model is useful when thinking about Disaster
Recovery.
Disaster Recovery Approach
Planning
The primary objective for the Planning Phase is to gain management consensus on the focus areas and scope of a Disaster Recovery Plan that will address major business risks
Implementation
Scoping & Risk
Assessment
Planning
Recovery Strategy
Development
Disaster Recovery
PlanApproval
Training&
Testing
ImplementationThe primary objective for the Implementation Phase is to develop, test, and rollout a Disaster Recovery plan. The implementation phase could be longer or shorter, depending upon scope, approach, and staffing defined during the Scoping and Risk Assessment phase
Disaster Recovery Approach
Activities
• Management Briefing• Questionnaires • Interviews• Focus Groups• Workshop
Determine the focus areas and scope for the Disaster Recovery Planimplementation phase
Scoping & Risk
Assessment
Recovery Strategy
Development
Disaster Recovery
PlanApproval
Training&
Testing
Key Deliverables
• Scoping and Risk Assessment Report
• Requirements Summary• Current Capability
Summary• Critical Business Functions
Matrix• Critical Systems Matrix
Disaster Recovery Approach
Develop strategies for each of the most critical systems based upon the outcome of the Scoping and Risk Assessment phase
Disaster Recovery Approach
Activities
• Develop Strategies
• Select Spinoff Projects
Key Deliverables
• The Recovery Strategy Report• Alternatives and
recommendations
Scoping & Risk
Assessment
Recovery Strategy
Development
Disaster Recovery
PlanApproval
Training&
Testing
Develop detailed plans for business continuity based upon the specific strategy identified for each critical system
Disaster Recovery Approach
Activities
• Develop Recovery Plan
Key Deliverable
• Recovery plan includes• Assessment Plan &
Procedures• Notification Procedure• Recovery center Procedure• Migration Plan (facilities,
data, people)• Team Organization ( Roles &
Responsibilities)
Scoping & Risk
Assessment
Recovery Strategy
Development
Disaster Recovery
PlanApproval
Training&
Testing
Disaster Recovery Approach
Develop detailed plans for business continuity based upon the specific strategy identified for each critical system (continue)
Key Deliverable
• Maintenance Procedures include
• Responsibility matrix for maintenance
• Testing strategy• How to update the Recovery
Procedure• Ongoing Center recovery
training schedule
Activities
• Develop Maintenance Procedures
• Prepare facilities and Infrastructure
• Recovery Center Location, facilities and required component
Scoping & Risk
Assessment
Recovery Strategy
Development
Disaster Recovery
PlanApproval
Training&
Testing
Provide training to the recovery team and conduct the testing based upon the testing approach documented in the Maintenance procedure
Disaster Recovery Approach
Activities
• Prepare training materials
• Conduct & Evaluate Training
Key Deliverables
• Training material• Trained staff
Scoping & Risk
Assessment
Recovery Strategy
Development
Disaster Recovery
PlanApproval
Training&
Testing
Get the Disaster Recovery Plan approved and rollout to the organization
Disaster Recovery Approach
Activities
• Revise plan (if necessary)• Approve the Disaster
Recovery Plan
Key Deliverable
• Management Sign-off• Publication & Distribution of the
disaster recovery
Scoping & Risk
Assessment
Recovery Strategy
Development
Disaster Recovery
PlanApproval
Training&
Testing
An Example of Disaster Recovery Team
AdministrativeSupport
Customer Liaison
System Softwareand DatabaseAdministration
Security
ComputerOperation and
Off-site Storage
Network Delivery
ApplicationSupport
ServicesDelivery
ProductionApplication
Support
Disaster Recovery
CoordinatorSite Restoration
Disaster Recovery Director
DRP Management Team
DR Team Organization
Examples of Data Center Roles & ResponsibilitiesTitle Roles Responsibilities
Disaster RecoveryDirector
Act as an advisor to theDR management team.
Administrative Support
Provide administrationsupport to the DR team
DR managementTeam
Act as the steering committeeof the DR Team
• Provide overall management support to DR team
• Responsible for strategic decision and key requirements or changes on DRP
• Make key decisions according to DRP
DR Team Organization
• Oversee the activities of the DR team• Budget for future DR requirements• Communicate with other management
to deal with the business process and recovery procedures
• Provide the DR team with administrative resources and facilities
• Co-ordinate with lawyers for court cases and handle legal documents
• Responsible for accounting matters on DR’s expenses
• Investigate the amount of damaged resources and insurance claims
Disaster Recovery Coordinator
Centralized coordination for the entire DR team
• Declare a disaster for each critical system component or for an entire site
• Inform the DR team of the decision• Execute DR procedures and recovery strategies
• Ensure that the DRP is updated and test on a regular basis
Site Restoration
Co-ordinate the recovery operations should a site be destroyed
• Organize security control for the disaster site and alternate processing site as required
System Software and Database Administration
Prepare recovery and restoration of software and databases
Customer Liaison
Coordinate and coordinate with users and customers on any recovery issue
• Notify users and clients of the disaster• Issue updates of recovery progress and expected time of recovery
• Help on data center migration issues and work re-allocation
• Responsible for the restoration of Hosts, Servers, DB, synchronize data, etc.
DR Team OrganizationExamples of Data Center Roles &
ResponsibilitiesTitle Roles Responsibilities
Computer Operations and off site storage
Manage storage of the backups
• Provide ready access to the required backups
• Ensure the backups are stored in a secure environment
DR Team OrganizationExamples of Data Center Roles &
ResponsibilitiesTitle Roles Responsibilities
Application Support
Manage applications with regard to DRP
• Manage application changes to ensure they are compliant with the DRP and vice versa
Network Delivery
Manage and monitor voice and data network
• Oversee the recovery of the communication environment
• Switch users to use the alternate network
• Co-ordinate with the communication service providers for WAN service recovery
Security Review and monitor DR procedures
• Ensure the DR procedures comply with the firm security and audit policies
DR Team OrganizationExamples of Data Center Roles &
ResponsibilitiesTitle Roles ResponsibilitiesService Delivery
Manage IT service delivery
• Oversee the service management recovery
• Provide helpdesk and end-user support as in DRP
• Work closely with Customer Liaison and Disaster Recovery Coordinator to ensure synchronization of communication channel to the users and the DR team activities.
The Chicago Flood : Impact
• One of the worst business disasters
• 230 buildings lost power for a couple of days
• Valuable government records were in jeopardy
• Extensive impact on electrical and computing systems
• The greatest financial impact on the CBOT, losing 25 billion in
trading of 36 products
Case Study
• Using Alternate Site Services approach
• Providing the alternate site nearly identical to the customer’s
damaged site
• Implemented by Comdisco Continuity Service
The Chicago Flood : Disaster Recovery
• Helped 2 Chicago banks resume operation within hours of
evacuation
• 17 customers from the financial, brokerage, government and
service/ distribution industries, were supported at their hot sites
within half a day
The Chicago Flood : Recovery Result
Case Study
• Building-wide power outage• Structural damaged and employee trauma, Businesses were
down • Water problem due to pipes were severed • Injured and Dead reports, the building was considered a crime
scene
The World Trade Center Explosion : Impact
• Fiduciary Trust, a banking and financial institute’s Recovery Plan• The data center switched automatically to their secondary power
system• Moved the operation to their alternate site in NJ which equipped
with a computer network nearly identical to that of the bank
The World Trade Center Explosion : Recovery
Case Study
• System was down for Friday afternoon and was up and running by Monday morning as if nothing had happened
• Employees retained their usual telephone numbers• Transactions went through the same as always• Customers couldn’t even detect that the bank was no longer
operating from the World Trade Center
The World Trade Center Explosion : Recovery Result
Case Study
Examples of Disaster Recovery Services
Alternate Sites
Provide alternate site nearly identical to the customer’s damaged site
Business Impact Analysis
Provide services such as defining disaster plans and addressing exposures to business and recovery administrators
Certification
Provide services such as certifying qualified individuals in the discipline and promoting the credibility and professionalism of certified individuals
Example Disaster Recovery Services
Education Classes
Creating a base of common knowledge for the business continuity/disaster recovery planning industry through education, assistance, and the promotion of international standards
On-Site Recovery Facilities
Manage the mobilization of an on-call response team, prepare pre-designated site, erect temporary pre-engineered structures, install mechanical and electrical systems and coordinate move-in activities
Satellite Communication
Provide satellite telecommunications products and services
Example Disaster Recovery Services
Examples of Disaster Recovery Services
Service Providers : Consulting Services
Andersen Consultingwww.ac.com
Bell Atlantic Federal CommGuardwww.commguard.com
Comdisco www.comdisco.com
Computer Security Consultants, Inc. www.crciweb.com
GSA Disaster and Business Recovery www.gsa-gsa.com
Intessera Technologies Group www. intessera.com
Example Disaster Recovery Services
Service Providers : Alternate Site Services
ARC Disaster Recovery Services www.arcdrs.com
Comdisco www.comdisco.com
HP Business Recovery Serviceswww.hp.com
IBM Business Recovery Services www.brs.ibm.com
SunGard Recovery Services, Inc. recovery.sungard.com
Example Disaster Recovery Services
El Camino www.elcamino.com
Providers : Computer Quick-ship , Hardware Replacement
Example Disaster Recovery Services
Open discussionQ & A
Disaster Management(i.e., Business Continuity)