tpl business continuity management system (bcms)
DESCRIPTION
TPL Business Continuity Management System (BCMS). How vulnerable, are we?. How vulnerable, are we?. Tonga Trench is 10,800m deep and located 200Kms east of the Kingdom of Tonga Tonga Trench is the fastest (24cm/year) velocity trench in the world. How vulnerable, are we?. - PowerPoint PPT PresentationTRANSCRIPT
TPL Business Continuity Management System (BCMS)
How vulnerable, are we?
How vulnerable, are we?• Tonga Trench is
10,800m deep and located 200Kms east of the Kingdom of Tonga
• Tonga Trench is the fastest (24cm/year) velocity trench in the world
How vulnerable, are we?
Storm surge – $1.4 million
Sleeping giant-last eruption was 1946
Tornadoes-The latest event was happened in September 2004
Most recent event(15 Mar 09) – Hunga Tonga about 60 km to the NNW of Nuku’alofa (Tongatapu) erupted.
NTT Tsunami 30 th Sept 2009. Three waves were reported with the last one reaching six meters
Average of 1cyclone/yrCyclone Waka 2001($104m)
4 May, 2006 (7.8 magnitude)
Coastal flooding9th Jan2009
How vulnerable, are we?
The History of Business Continuity
Disaster Recovery Planning
Business Continuity Planning
Business Continuity
Management System
Alternative Planning /
Plan B
Fallback Plans , Contingency Plans
IT or Technical Contingency Plans
Organization wide Contingency Plans
Holistic Contingency Plans
BCP: Process of developing advance arrangements and procedures that enable an organization to respond to an event in such a manner that critical business functions continue with planned levels of interruption or essential change.
DRP: The advance planning and preparations that are necessary to minimize loss and ensure continuity of the critical business functions of an organization in the event of disaster. The technological aspect of business continuity planning.
11/9/2001Mid 2007
What is the System?
Business Continuity Management System is:
• A living system; managed every day; updated at all times when the situation changes
• Holistic Requires a strategy/policy End-to-end critical business process
restoration (focus is not asset recovery only) Communication – clients, employees,
emergency organisations Integration with business processes of other
Business Unites (BUs) Address employee safety & wellbeing
What is BCM System?
RISK has four key components:
1. Threats: Fire, earthquake, power failure, loss of key staff etc.2. Assets: Human, mission critical systems and infrastructure,
suppliers, clients, buildings, information and records3. Vulnerabilities: Weaknesses in assets such as single point of
failure, inadequacies in fire protection, poor staffing levels, unreliable IT security, inefficient data back up etc.
4. Impact – Financial or Non financial (e.g. reputational, health & safety impacts)
Risk description (example):
“A significant financial and reputational impacts to the Company as a result of IT Manager is unable to restore the Billing Server which was damaged by a power surge on time.”
Describing BCM Risk
Risk Controls
PR
EV
EN
TIO
N
CO
NT
RO
LS
ACCIDENTM
ITIG
AT
ION
CO
NT
RO
LSHARM
Reduce Likelihood
Reduce Consequence
BCM Risks
• BCM Plans are mitigation controls (i.e. minimise consequences) rather than prevention controls
• BCM Risks are managed through Quantate as part of the TPL’s Risk Management Program
BCM Assumptions• Secondary or alternative site is always available to the
company to restore business critical processes if the primary site not accessible due to damages sustained by a disaster.
• Minimum resources (e.g. staff, IT equipment etc.) are still available to restore business critical functions following a disaster
• Popua power station is unharmed and minor damages to the distribution network assets
• National communication methods (i.e. TCC or Digicel) are available to communicate with public.
• BCM plans are not applicable for nationwide disasters (e.g. tsunami) that may have damaged the company’s all primary and secondary sites, generation and distribution assets.
BS25999 Standard
• Requires a policy statement• Requires focusing on restoring end-to-end processes (not just
equipment or machinery)• Requires RTO/RPO analysis for all critical processes• Requires a risk assessment and mitigation• Requires a hierarchy of plans (IMP, BURP, ERP etc.)• Requires a command Centre and DR sites• Requires a Call Tree• Requires testing, maintaining & auditing of BCM plans periodically
P la n n in g
• B C M P o l ic y
• S t e e r in g C o m m it t e e
• S t r u c t u re , ro le s &
re s p o n s ib i l i t ie s
• B C M s c o p e & o b je c t iv e s
• To p m a n a g e m e n t
c o m m itm e n t
Planning• BCM Policy • Steering Committee• Structure, roles &
responsibilities• BCM scope & objectives• Top management
commitment
Im p le m e n t a t io n &
O p e r a t io n
• Id e n t i fy B C th re a t s
• B u s in e s s im p a c t a n a ly s is
• P ro c e s s c r i t ic a l i t y
a n a ly s is
• G a p a n a ly s is
• R is k a s s e s s m e n t
• R e c o v e r y s t r a t e g ie s &
s c e n a r io s
• G e n e r a t in g B C M
P la n s /B C M M a n u a l
Implementation & Operation
• Identify BC threats• Business impact analysis• Process criticality
analysis• Gap analysis• Risk assessment• Recovery strategies &
scenarios• Generating BCM
Plans/BCM Manual
Te s t in g & A u d it in g B C M
p la n s
• P e r io d ic a l t e s t s – Ta b le
t o p e x e rc is e s ,
S im u la t io n s , L iv e
E x e rc is e s
• P e r io d ic a l a u d it s
• M a n a g e m e n t re v ie w
Testing & Auditing BCM plans
• Periodical tests – Table top exercises, Simulations, Live Exercises
• Periodical audits• Management review
M a in t a in in g a n d
Im p r o v in g
• O n g o in g t r a in in g
• U p d a t e p la n s
w h e n e v e r t h e re is a
m a jo r c h a n g e t o t h e
c o m p a n y
p ro c e s s e s /s t r u c tu re
• E m b e d d in g B C M in t h e
c u lt u re
Maintaining and Improving
• Ongoing training• Update plans
whenever there is a major change to the company processes/structure
• Embedding BCM in the culture
PLAN
DO
CHECK
ACT
Overview of BS 25999:2007
BCM Plan Portfolio (TPL)
BCM Plan PortfolioBCM Plan Description Team Involved
Business Continuity Policy
A policy framework underpinning the need and commitment to Business Continuity Planning
CEO
Incident Management Plan (IMP)
Highest level plan, addresses strategic issues in incident management such as board reporting, media relation, liaise with insurers/brokers & salvage companies and emergency organizations, etc.
Incident Management Team
IT Disaster Recovery Plan (IT DRP)
Provide procedures for fail-over to DR Site recovering IT systems such as severs, computers, phones, internet, office equipment etc.
IT Disaster Recovery Team
Business Unit Restoration Plan (BURP)
Provide instructions to each Business Unit Manager and his/her team to move to the DR Site and resume their critical business processes in conjunction with the Incident Management Plan and IT Disaster Recovery Plan. Individual BURPs are to be developed for each physical location.
Business Unit Emergency Response Team
Emergency Response Plan (ERP)
Contains specific emergency response procedures that must be followed during different stages of a disaster (e.g. tsunami, flood, hurricane, fire, earthquake etc.). Simply, BURP contains procures for restoration of critical business processes and ERP contains procedures for protection of assets and staff and public health & safety.
Incident Management Team and Emergency Response Teams
Call Tree Contains emergency contact information, next of kin contact information etc. for all staff members under each Manager, Supervisor & Team Leader
All Managers, Supervisors, and Team Leaders
• BURP is a plan used to restore the entire business unit including the key processes, which employees are moved to DR site, activation of call tree, coordination with other business units, resume business as normal. (holistic)
• DR Plan is technical plan designed to restore equipment and machinery back to normal. (e.g. IT DR Plan)
• Companies use both names intermittently.
BURP vs. DR Plan
• Identify the identification• Assess damage and identify lost critical processes• Declare a disaster if the critical operations are unable to
be restored within the primary premises• Shut- down power supply for the safety of public• Activate BURPs/DRPs. Move into the DR site to restore
operations at a minimum acceptable levels• Restore the damaged distribution assets and network• Restore power generation• Resume business as usual at the DR Site• Monitoring• Review & Documentation
Managing Incidents Within the Capability of TPL Plans
Conducting Business Impact Analysis (BIA)
RTO & RPO
• The Recovery Time Objective (RTO) is the goal for how quickly TPL need to recover the interrupted processes.
• The Recovery Point Objective (RPO) is the point in time to which data must be restored to successfully resume the interrupted processes (often thought of as time between last backup and when an “interruption” occurred).
(Last Backup)
• BIA identifies impacts (financial & non-financial) due to malfunction of company critical processes resulting from disasters
• The goal of BIA is to identify, categorise and prioritise the mission critical processes and resources (e.g. technology, infrastructure, vital records, personnel, suppliers) required to function these processes within the company.
• Some examples of such processes are customer service & support, order & data processing, pay roll, IT & communication, and purchasing and production.
• BIA also identifies interdependencies (telephones, IT facilities, office space etc.) between different business units within the company.
• The priorities of critical processes for subsequent resumption are based on RTO and RPO objectives of each process.
Business Impact Analysis
• Administer BIA Questionnaire with each BU Manager (template supplied), collect data, review and analyse them
• Consider the worst case scenario; a total loss of personnel, facilities and property.
• Does the disaster affect the critical processes?• Determine impacts if a process is lost due to the disaster:• Identify and quantify financial impacts (recovery costs,
production losses & revenue losses)• identify non-financial impacts
– Impact on staff or public wellbeing– Impact of damage to premises/assets/records– Impact of breaches of statutory regulations– Damage to reputation– Deterioration of product/services quality– Environmental damage
BIA – Identifying Critical Processes
• Determine minimum resources required for minimum acceptable recovery of each process manually or using alternative processes.– Staff – skills and knowledge– Secondary premises– Plant, equipment, software, data/records– External services providers
• Determine RTOs and RPOs for each process. Note shorter the RTO, greater cost of recovery. Also, longer RTOs increases the chances that the recovery will not be achieved within MTO.
• Determine whether the company is able to achieve RTO/RPO objectives currently. If not, flag them as risks to analyze them at a later stage
• Rank the processes based on impacts to determine priority of recovery of critical processes
• Example: Results of RTO/RPO analysis is supplied• Conduct a process dependency analysis with other business units
BIA – Identify Critical Processes
A
B
C
BIA - Process Dependency Analysis
D
E
• Consider the entire chain of processes to be recovered together • Adjust RTOs if necessary• Define Maximum Tolerable Outage (MTO)• BCM Plans are designed to recover company critical processes
within MTO
BU 1 BU3
RTO = 2 hrs.
RTO = 5 hrs.RTO = 3 hrs.
RTO = 7 hrs.
RTO = 1 hr.
BU 2
• Scenario 1 – If Estimated Recovery Time < MTO– Primary Site is intact; (Example: malfunction of IT Servers)– Execute IT DR Plan for all IT issues
• Scenario 2 – If Estimated Recovery Time <= MTO– Primary site is inaccessible but DR sites are operational, key staff are
available; national communication systems functional (Example: cyclone)– Minimum and acceptable reputational/financial losses– Execute all BCM Plans including IMP, BURPs, & IT DR Plan – Resume company critical processes from the DR Sites to a minimum
acceptable level
• Scenario 3 – If Estimated Recovery Time > MTO– All primary and DR sites are not operational; key staff are unavailable;
national communication system is malfunctioned (Example: tsunami)– Requires a good communication plan (e.g. radio communication)– Potential reputational/financial losses are inevitable– Execute BCM plans with a delay
BIA – Disaster Declaration
Incident Management Plan
TPL MTO = 2 days
Level 1: Minor Incident – Minor incidents occur only when critical business operations are affected and distribution network recovery or IT problems are expected to be resolved within 3 hours. In this occasion, the incident is resolved on ‘business as usual’ basis and no disaster is declared. Level 2: Major Incident – Major disruption to the critical business operations with system outage expected to last more than 1 day but not more than two business days. In this case, the incident is escalated to a semi-disaster and is declared by the members of the IMT. At this incident level, only Generation and Distribution DR Plans will be invoked as applicable. Level 3: Critical Incident – Critical disruption to the business operations with system outage expected to last more than two business days. In this case, the incident is escalated and a disaster is declared by the members of the IMT. At this incident level all critical Business Units (IT, Finance, Generation, and Distribution) BURPs/DRPs will be invoked.
Disaster Declaring Criteria
IMP Role Primary Deputy
Incident Controller Rod Lowe(Distribution Manager)
Michael Lani 'Ahokava (Generation Manager)
Communication – Board, Media
John van Brink(Chief Executive Officer)
Steven Esau(Finance Manager)
Financial SupportSteven Esau(Finance Manager)
Epoki Veituna(Financial Accountant)
Distribution DivisionIan Skelton(Network Planning Manager)
Setitaia Chen(Network Asset Design Manager)
Generation DivisionMichael Lani 'Ahokava (Generation Manager)
Murray Sheerin(Power Station Superintendent)
Risk & Regulatory Ajith Fernando(Risk & Compliance Manager)
IT SupportLualala Tapueluelu(IT Supervisor)
Peifaga Fuiono (IT Technician)
Legal SupportWilliam Edwards(Company Secretary)
Human Resources'Alisi Tu'inukuafe(HR & Administration Manager)
Nau Lavemai(Senior Administration Officer)
IMT AssistanceNau Lavemai(CEO Secretary)
Jane Guttenbeil(Communication Advisor)
Media RelationsJane Guttenbeil(Communication Advisor)
Nau Lavemai(Senior Administration Officer)
Incident Management Team (IMT)
Command Centre OptionsCommand Centre Activities:
–Declaring the disaster based on the damage assessment–Establishing 24 hour communication channels–Activating, coordinating and monitoring BURPs/DRPs–Monitoring and acting on staff health & safety –Media relations–Making all key decisions for successfully restoration
Possible Locations for the Command Centre:–Head Office (if available)–Distribution Office (if available)–Scenic Hotel, Digicel Network Operating Centre
DR Site or Network
Incident Site or Network
Distribution Office, Tongatapu
Popua Generation Plant, Tongatapu Distribution Office,
HaapaiDistribution Office,
Vavau
Head Office, Tongatapu DR Site DR Site
Distribution Network, Tongatapu
There is no DR Network for TPL in Tongatapu. Therefore, if the network is severely damaged by a disaster, TPL technical staff (with assistance from external emergency teams) have to restore the network as quickly as possible.
Popua Generation Plant, Tongatapu
There is no DR Generation Plant for TPL in Tongatapu. Therefore, if the existing generation facility is severely damaged by a disaster, TPL technical staff have (with assistance from external emergency teams) to restore the generation facility as quickly as possible.
Office, Haapai DR Site
Distribution Network/ Generation Plant, Haapai
There is neither DR Generation Plant nor Network for TPL in Haapai. Therefore, if the existing generation facility/ distribution network is severely damaged by a disaster, TPL technical staff have to restore the generation/network facility as quickly as possible working day and night.
Office, Vavau DR Site
Distribution Network/ Generation Plant, Vavau
There is neither DR Generation Plant nor Network for TPL in Vavau. Therefore, if the existing generation facility/ distribution network is severely damaged by a disaster, TPL technical staff have to restore the generation/network facility as quickly as possible working day and night.
Distribution Network/ Generation Plant, Eua
There is neither DR Generation Plant nor Network for TPL in Eua. Therefore, if the existing generation facility/ distribution network is severely damaged by a disaster, TPL technical staff have to restore the generation/network facility as quickly as possible working day and night.
DR Sites
Incident Management
Team
Business Unit Managers
Field Supervisors &
Foremen
Office Supervisors
Linesmen
Staff
Internal Communication – Call Tree
• Brief description of the disaster, loss of life or injuries, damage summary, response and recovery details
• Location of the DR or Network Site to report to or to remain at home on standby
• Phone number of the DR Site/Network Supervisor• Immediate actions to be taken• Location and time the team should meet at DR or Network
Site• Instruct everyone notified not to make any statements to
the media or social media.
IMT
Distribution Manager
Finance Manager
Generation Manager
Planning & Design
Manager
Business Unit Restoration
Plans
Vendors & Suppliers
Police, Fire & Hospital
National Emergency
Management Office
(Ministry of Works)
Tonga Met Service
Radio Tonga
Tonga Defense Service
Government Organisations
Embassies & High
Commissions
Business Unit DR Plans
IT Supervisor
External Communication
• Only the authorised spokespeople shown below are to comment to the media.
• John van Brink (CEO) – English media• Steven Esau (Finance Manager) – Tongan media
• All unauthorised staff should know that if they are contacted
by the media that they are not to comment and that their standard reply should be “I’m sorry I can’t help you, I am not the appropriate person to speak to. If you provide me your name and contact number, I’ll get the right person to get in touch with you shortly”.
• Jane Guttenbeil and Nau Lavemai coordinate media relations
Media Relations
Business Unit Recovery Plan
PriorityRank
Process to Recover
Minimum Systems Required
Can TPL Provide Min. Systems Required?
RPO RTOMinimum Staff Required
1Customer call management
Voice call & Calls redirection to mobile phones
Yes (refer IT DR Plan)
NA 1 Day Dedicated staff
2Meter reading & Billing
Computers with Orien, Printer, mobile phones
Yes (refer IT DR Plan)
1 Day 2 DayBilling Supervisor, Meter readers
3Revenue management
Computers with Orien, mobile phones, email
Yes (refer IT DR Plan)
1 Day 2 DayCredit supervisor and 2 cashiers
Division Manager/Supervisor Team Members
IT Lualala Pei, Sonia
Billing Ofa Makueta
Revenue Heta Ovava, Grace
Head Office BURPMinimum Systems Required at DR Site
Minimum Staff Required at DR Site
PriorityRank
Process to Recover
Minimum Systems Required
Can TPL Provide Min. Systems Required?
RPO RTOMinimum Staff Required
1Customer faults call management
Calls redirection to mobile phones, Computer with File Maker Software
Yes (refer IT DR Plan)
NA 1 day2 Call Centre Staff
2GIS Database Management
Computers with GIS Database, Printer, mobile phones
Yes (refer IT DR Plan)
NA 2 Day1 Planning Staff
Division Manager/Supervisor Team Members
Faults Malia Hehea
GIS Database Ian Semi
Distribution Office BURPMinimum Systems Required at DR Site
Minimum Staff Required at DR Site
PriorityRank
Process to Recover
Minimum Systems Required
Can TPL Provide Min. Systems Required?
RPO RTOMinimum Staff Required
Option 1
Power generation to critical organizations
3MW Backup Generator (TPU) and 500KW Generators for outer islands, 2 weeks fuel supply
No, but will have to restore damaged generators as quickly as possible
NA4 Days
Generation manager and all generation technical staff
Option 2
Power generation to critical organizations
3MW Backup Generator (TPU) and 500KW Generators for outer islands, 2 weeks fuel supply
If the power station is unrecoverable, generators will be hired from Aggreko NZ
NA 1 weekCEO, Power Generation Manager
Generation DR Plan
PriorityRank
Process to Recover
Minimum Systems Required
Can TPL Provide Min. Systems Required?
RPO RTOMinimum Staff Required
1Distribution network
Network equipment (poles, transformers, insulators etc.)
Yes, Transnet stores has adequate standby supply for disasters.
NAUrban
areas – 2 Days
Villages – 1 Week
Distribution manager, supervisors, and 100% field staff
2Field vehicles, PPE, VHF Comms
Assume 50% damaged
Yes, there VHF systems for 50% of field vehicles
NA 100% field staff
Distribution Network DR Plan
Note: H/O staff are expected to support distribution staff for cooking etc. Meter readers are expected support linesmen at the field
Livening Priorities• Airport – Feeder 3• Hospital – Feeder 1/2• NEMO/MET Service– Feeder 1• Emergency evacuation centers (e.g. schools &
churches) if applicable – Feeder 1/3• Prime minister’s Office – Feeder 2• King’s royal palace – Feeder 2• Water Board – Feeder 2/1• Defense organisations – Feeder 2• Communication organisations (e.g. Radio, TV)• Government offices – Feeder 2• Commercial organisations – Feeder 2
Feeder 1
Feeder 2
Vaini Feeder 1
Emergency Response Plan
Emergency Response PlansERPs contain specific emergency procedures that must be followed during a disaster in order to protect people and assets, and to mitigate further damage. For example:• Building Evacuation• Damage Assessment • Spillage (Oil/Chemical/Diesel Fuel) at Power
Station• Fuel Supply Cut Off• Civil Disturbance• Hurricane/Storm• Records Recovery• Bomb Threat• Earthquakes
Testing, Maintenance & Auditing the BCM Plans
Testing Business Continuity Plans• The development of BCM plans does not end of the BCM
process. The emphasis of BCM is upon management• Without regular maintenance and testing, their usefulness in a
real crisis may be severely limited• Practicing the company’s ability to recover from an incident.• Test the scenarios identified under the Scenario Planning
Section (refer above)• Tests validates effectiveness and timeliness (RTO and RPO
objectives) of restoration of critical activities.• Determine adequacy of SLAs (service level agreements) with
third party suppliers• Testing identifies communication breakdowns during call-tree
activation trials.• Tests are essential to developing teamwork, competence,
confidence, and knowledge which is vital at the time of an incident.
• Frequency of testing can be annual, bi-annual etc. and announced or unannounced
Types of Tests1. Table top checks: is the simplest and most frequent form of tests. The
author of the plan simply checks the contents of the plan to ensure that information (e.g. employee names, contact numbers) are up to date.
2. Walk-Through: similar to table top exercise but involve all named participants to test a special disaster scenario. The participants are brought together to role play their defined resumption procedures alongside those of others. This is the most common method of testing BCM plans as is relatively less expensive.
3. Simulation: exercises widens participation to all those who are involved in business recovery with a prior notice. A simulation may includes an interruption such as building fire in which people do not access to normal facilities and must relocate to an alternative location (i.e. DR site).
4. Full or Live Exercises: are the most extensive and expensive form of test thus they are normally undertaken yearly or bi-yearly basis. This is the largest scale of test and involves the invocation of all BCM plans (i.e. IMP, BURP & ERPs) to deal with a scenario which normally involves a move to an alternative site where operations are to be resumed. The dependencies and links between BURPs are the focal point of this type of testing.
Post Test EvaluationsThe post test evaluation should consider following issues:
• Did the plans help or hinder recovery efforts?• Did people deviate from the plan and, if so, what was the
effect of this?• Were RTO & RPO objectives achieved?• Where and when did delays occur?• What did staff do well?• What did staff do badly?• How did the expectations differ from what actually happened?• Were all BURPs integrated sufficiently to achieve recovery?• What are the priorities for change?• Is there a paper or audit trail?• How should changes be implemented?• Could the observation process be improved?• Identify and document all the deficiencies, lesions learned etc.• Update BCM plans if required based on test outcomes
BCM Maintenance & Auditing• Ongoing BCMS audits should be conducted to evaluate and
identify gaps/inconsistencies of entire BCM portfolio of plans.• If testing has shown that plans have failed to meet the recovery
objectives, a fundamental review of plans may be required.• In addition, audits should be conducted after company restructure
as a result of a merger or acquisition, installation new systems & facilities (e.g. IT) etc.
• Auditor will issue corrective and preventive actions focusing continuous improvement
• Ensure the BCMS is current and up to date at all times (i.e. BCM plans are living documents)
• Provide ongoing training on the entire BCM process (e.g. BIA, Risk Management etc.)
• Communicate to all employees the BCM initiatives through newsletters, induction programmes etc.
• Cultivate a BCM culture.
Interruptions Outside the Capabilities of BCM Plans
Managing Disasters Outside the Capability of BCM Plans
• Usually BCM plans are prepared to mitigate impacts and resume business operations for manageable interruptions such as fire, flood, supply chain interruptions, IT failure, etc.
• However, there are some disasters (e.g. tsunami) where company’s current BCM plans may be simply ineffective due to following reasons: Business may be interrupted for a while RTO and RPO objectives may not be able to be
achieved (Estimated Recovery Time > Predefined MTO) All DR sites may have been damaged Possible loss of key staff Severe damages to the company’s mission critical
assets (e.g. distribution network) All communication (e.g. TCC, Digicel) malfunctioned Severe damages to the power station
Managing Disasters Outside the Capability of BCM Plans
In case of a such disaster, a possible action plan would be:• Shut-down the power generation for safety of he public• Monitor and support staff health & safety and wellbeing• Coordinate with NEMO for possible recovery efforts (e.g.
assistance from TDS to clear fallen trees etc.)• Broadcast public that power is interrupted for a while• Restore the distribution network (might have to import
network equipment from overseas) with some delays• Establish alternative fuel supply (if the fuel tank has
been damaged by the tsunami)• If the power station cannot be recovered import
generators from Aggreko Ltd., NZ.• Establish a new DR site to resume key processes (e.g.
billing/revenue) to minimum acceptable level.• Implement current BCM plans with a delay