avoiding data center disasters

53
1 How to Address IT Service Continuity Risk Factors | June 23, 2016 Avoiding Data Center Disasters

Upload: jesse-richard

Post on 20-Feb-2017

177 views

Category:

Business


1 download

TRANSCRIPT

Page 1: Avoiding Data Center Disasters

1How to Address IT Service Continuity Risk Factors | June 23, 2016

Avoiding Data Center Disasters

Page 2: Avoiding Data Center Disasters

2

Agenda

• Introduction and Housekeeping

• Avoiding Data Center Disasters with Jim Nelson

• How to Respond to a Data Center Disaster with Vincent Geffray

• Q&A Session with the Speakers

The slides and recording will be sent to all registrants early next week.

Page 3: Avoiding Data Center Disasters

3

Questions

@ITAlerting #ITAWebinar

Page 4: Avoiding Data Center Disasters

4

Speakers

Vincent Geffray

Senior Director Product MarketingEverbridge

Jim Nelson

PresidentBusiness Continuity Service, Inc. (BCS)

Page 5: Avoiding Data Center Disasters

Avoiding Data Center Disasters

Page 6: Avoiding Data Center Disasters

6© 2016 BCS

Avoiding Data Center Disasters

n Main Causes of Downtimen Challenges we face Complexity & Risksn Some Warning Signs/ Indicatorsn Some suggestions, Quick Fixes/

Interventions/ Mitigationsn References

Page 7: Avoiding Data Center Disasters

Who has..

7© 2016 BCS

A data center that is 10 years old?

A data center that is brand new-less than 2 years old?

A data center that is 5 years old?

A data center that is 15 years old?

A data center that is 20 years old?

A data center that more than 20 years old?

Page 8: Avoiding Data Center Disasters

Who has….

8© 2016 BCS

Experienced an Scheduled outage in the last 12 months?

Experienced an unplanned outage in the last 12 months?

Completed a limited recovery exercise in the last 12 months?Completed a comprehensive recovery exercise, including business units in the last 12 months?

Updated your BIA/ RA in the last 12 months?

Page 9: Avoiding Data Center Disasters

Ponemon Instituten According to our new study, the average cost of

a data center outage has steadily increased from: ¨$505,502 in 2010 ¨ $690,204 in 2013¨$740,357 in 2016 (or a 38 percent net change).

n Maximum downtime costs increased 32 percent since 2013 and 81 percent since 2010.

n Maximum downtime costs for 2016 are $2,409,991.

9© 2016 BCS

Page 10: Avoiding Data Center Disasters

Ponemonn UPS failures are the #1 cause of unplanned

data center outages¨accounting for one-quarter of all such events.

n Cybercrime fastest growing cause of data center outages. ¨ 2 percent in 2010, 18 percent in 2013, 22 percent

in the latest study. Cost impact $981kn Human error 22%. Cost impact $489kn IT equipment failure root cause of 4%.

¨Cost impact $995k10June, 2016

Page 11: Avoiding Data Center Disasters

Gartner Reportn 80% of outages impacting mission-critical

services will be caused by people and process issues, and

n more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues.

11© 2016 BCS

Page 12: Avoiding Data Center Disasters

Gartnern The study highlights 7 items, I highlight 3:1. How well are standards defined and

followed?2. How well are IT services documented or

tracked3. What is the degree of business risk that

IT organizations will tolerate

12© 2016 BCS

Page 13: Avoiding Data Center Disasters

D&BAccording to Dunn & Bradstreet, 59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime per week - See more at:

http://www.businesscomputingworld.co.uk/assessing-the-financial-impact-of-downtime/#sthash.KQaVtyLh.dpuf

13© 2016 BCS

Page 14: Avoiding Data Center Disasters

Challengesn BIG DATA (structured, unstructured)n “Cloud” issuesn Green, PUE, efficienciesn Competitive Intelligence, Analytics,

Dashboardsn Social media, wikis, blogs, video,n Regulatory, compliancen IA, Data Security, Cybern Customers, Budget, resources, $$$, Time

14© 2016 BCS

Page 15: Avoiding Data Center Disasters

15© 2013 BCS

Page 16: Avoiding Data Center Disasters

Enterprise data center architecture

Front endWeb serving

Web content access

Mid tierApplication business logic

Business transactions, commit/update

Back endData base query/response

DC Architecture’s Three-tier Structure Adds Cost / Complexity

Web

Application

Data base

•Little resources other than

networking, transient data

•Large amounts of CPU and memory

•Large amounts of CPU and memory,

•Fast storage,•Low latency networking

Workload characteristics

Workload isolation

Workload isolation

NAS filer

NAS filer

Fcsan

Fcsan

16

Page 17: Avoiding Data Center Disasters

17© 2016 BCS

Page 18: Avoiding Data Center Disasters

Is all DOWNTIME the same?

n Unplanned downtime: The Oopsie!...n Scheduled downtime:Backups, hardware, software upgrades, preventative maintenance, capacity upgrades, EOL, retiring old systems, network upgrades, cabling changes….

18© 2016 BCS

Page 19: Avoiding Data Center Disasters

.The great majority of system and data unavailability is the result of planneddowntime that occurs due to required maintenance. Unplanned downtime accounts for only about 10% of all downtime, its unexpected nature means that any single downtime incident may be more damaging to the enterprise, physically and financially, than many occurrences of planned downtime.

19© 2016 BCS

Page 20: Avoiding Data Center Disasters

20© 2016 BCS

Warning SignsData Center Location and Design

Building codes, CRAC/ CRAH units on the raised floor, cleaning, vibration, UPS, fire detection / suppression, architectural ceilings, raised floor, walls

Data Center AccessStaff in room, packing materials, combustibles, service personnel, vendors, customer tours…

Page 21: Avoiding Data Center Disasters

21© 2016 BCS

Location hazards, risks evaluationn Sample list of potential natural hazards

Lions and tigers and bears oh My!Severe weather, Sandy, lighting, flood, hurricanes, typhoons, super storms, tornado, brush / forest fires, drought, earthquake, Nor'easter, blizzard, ice storm, tsunami, typhoon, slapping wires, etc

n Potential man-made hazardsViolence in the Workplace, Boston lock down (SIP), Fukishima, Gulf oil spills, terrorism, flight paths, neighbors, rail, hazmat, chemical spills, pollution, EMF, human error, terrorism, denied access, occupy “everything”, cyber attacks, RAID controller failure, bad power supplies, UPS failure, battery failures, cooling failure, change control problem, violence in the workplace, bridge collapse, etc

Page 22: Avoiding Data Center Disasters

22© 2016 BCS

Proximity Evaluation

Tank Farm

Tank Farm

Tank Farm

Power Plant

Page 23: Avoiding Data Center Disasters

23© 2016 BCS

Building EvaluationsnRent / Buy / Build / Cloud Hybrids /

ConsolidationBuilding codes, existing building history, capacity, # of floors, stacking, security, wind speed, high speed network access, power quality, expansion space and %, architectural features, floor loading, open vertical / slab height, transportation, parking, detention/ retention ponds, backup water, cooling capacity, heat sinks, acoustical, bargaining unit or right to work, access to technical talent, access to physical plant / facilities talent, tax breaks, tax incentives, etc , etc

nTCO: Design, Build, Operate, Decommission or repurpose

Page 24: Avoiding Data Center Disasters

24© 2016 BCS

Causes of OutagesHuman Error

Reaching design limits, poor change control, weak documentation, Lack of standards / processes / best practices, no training, tours and visitor access, accidents, water incursion …..

Power Quality issuesPoor voltage / current / frequency regulation, common mode noise, grounding problems, harmonic issues, EMF, RFI, wireless…

Design and operationMost design are fine, operations and changes compromise the design intent, poor or deferred maintenance, generator, UPS and batteries, mechanical moving parts, transfer switches, breakers, belts, pumps, capacitors, filters,

EnvironmentTemperature / Humidity, contamination, corrosion, pollution….

Page 25: Avoiding Data Center Disasters

25© 2016 BCS

Cabling Standards?

Page 26: Avoiding Data Center Disasters

26© 2016 BCS

Cabling Standards?

Page 27: Avoiding Data Center Disasters

OK so…

Ideas, suggestions and approaches to address the integration and pervasiveness of technologies throughout the organization.

27© 2016 BCS

Page 28: Avoiding Data Center Disasters

Get help!

n The risks of positioning yourselves (ICT) as the “experts” on all things is you will be viewed and held accountable (read as blamed!). This can be a “resume updating event”.

n I did NOT say it was your fault-I said I am going to BLAME you.

28© 2016 BCS

Page 29: Avoiding Data Center Disasters

Suggestions and strategiesn Engage Top Managementn Timing is appropriaten Engage the “Business”n Establish a Steering Committeen Update, conduct a Business Impact Analysis

(BIA), Risk Assessment (RA)n Use standards approach-ISO, TIA, BCSI etc.

29© 2016 BCS

Page 30: Avoiding Data Center Disasters

Have a Plann Align to standards!n Backups are finen Restores are minimumn Testing is Criticaln Alignment with Business-BCM, RM, IA,

Resilience,n The “traditional” BIA and RA falls short unless

done well!

30© 2016 BCS

Page 31: Avoiding Data Center Disasters

Reviewn Locationn Process controln Documentationn Train your peoplen Mind the little thingsn Project Management / Change Controln Engage the Businessn KYP-Know your personnel

31© 2016 BCS

Page 32: Avoiding Data Center Disasters

Organizational approachn Leverage other organizational areas (such as

Risk, Information Assurance, BCM, Internal Audit, EH&S, Security, Insurance)

n Hire it done---engage some expertsn DIY-do it yourselfn What are trade-offs?n Disruption –vs- Cost Benefitn Document!

32© 2016 BCS

Page 33: Avoiding Data Center Disasters

33

Some suggestions n A tidy shop is a happy shop-Keep a clean

facilityn Hire a technical writern Train your people—cost versus benefitn Noah (2x2) escort everyonen Clearly define expectations, procedures to

define “what to do” and “not do”n Manage / Monitoring -Immediate response

© 2016 BCS

Page 34: Avoiding Data Center Disasters

34

Some suggestionsn Take readings and monitor the environmentn Integrate with CABn Keep Copper Communication Cables away

from power cables, motors, transformersn Handheld devices –distance from EDP

equipment

© 2016 BCS

Page 35: Avoiding Data Center Disasters

35© 2016 BCS

Some suggestionsn Define and ENFORCE policies & proceduresn No emergency deliveries-keep sparesn No packing / unpacking inside the data centern No food, drinks, people if possible.n AVOID rushing, running and moving too quicklyn Contractor selection & specifications before and

after workn Walk off matsn Anti static procedures-USE them!

Page 36: Avoiding Data Center Disasters

Some suggestions

n Communicate with your teamsn If you do not define expectations-they will

make them upn Leverage your vendorsn Talk with PEOPLE-yes that is allowedn Go outside your “comfort zone”n Look for choices

36© 2016 BCS

Page 37: Avoiding Data Center Disasters

Referencesn Gartner

http://www.rbiassets.com/getfile.ashx/42112626510

n Ponemon http://www.emersonnetworkpower.com/en-US/Brands/Liebert/Documents/White%20Papers/data-center-costs_24659-R02-11.pdf

37

Page 38: Avoiding Data Center Disasters

38© 2016 BCS

n ICORhttp://theicor.org/http://www.theicor.org/art/pdfs/ICORCEBrochure-Web.pdfn Business Computing Week

http://www.businesscomputingworld.co.uk/assessing-the-financial-impact-of-downtime/

n Symantec Dennis Wenk

Page 39: Avoiding Data Center Disasters

Thankyoun JimNelson-BCS,Inc. 866-629-6327n www.businesscontinuitysvcs.comn InternationalConsortiumforOrganizationalResilience(ICOR) 866-765-8321

n www.theicor.org

39June, 2016

Page 40: Avoiding Data Center Disasters

40

Vincent Geffray | Everbridge |@Vgeffray

How to Respond to Data Center Disasters

Page 41: Avoiding Data Center Disasters

41

What are the Root Causes of an unplanned outage?

69% of the ROOT CAUSE

Page 42: Avoiding Data Center Disasters

42

What’s the cost of an unplanned data center outage?

In 2016 An average of

$7,000 per minute

Page 43: Avoiding Data Center Disasters

43© 2016 Everbridge - Critical Communications for IT. Reproduction Prohibited

• Evolution of Enterprise IT An unplanned outage

is a Business issue

Page 44: Avoiding Data Center Disasters

44

Define Major Incidents with your business

Disruption of service? à Severe disruption or Interruption

Impact to the business operations?à Large

Urgency to restoreà High

Page 45: Avoiding Data Center Disasters

45

Assess the business impact of each mission-critical service

Page 46: Avoiding Data Center Disasters

46

Create a Major Incident Response Virtual Team

Major Incident Response Team

Major Incident Response team may include members from:Ø Network engineering, Application support, DB

support, Middleware team, Server/infrastructure team, ERP support, EMR support team

Ø Service Desk managerØ Customer service managerØ Change managerØ IT Service DirectorØ 3rd Party vendor (technical contact)

PrimarySecondary

Assign someone to be the Major Incident Manager

Page 47: Avoiding Data Center Disasters

47

Create a Communication Strategy

+ Who will you be contacting?• Your IT experts• Senior management• Impacted customers

+ How/where do you keep contact information up-to-date?

• Individuals• On-call personnel• Static groups• Dynamic groups• Subscriptions

+ How will you be communicating?• Text, SMS, • Voice, text to voice• Emails• Mobile app, etc…

+ How often will you be communicating during the crisis?

+ What content will be communicated, to whom ?

• Notification templates

+ What’s the escalation rule if your IT experts don’t respond

Page 48: Avoiding Data Center Disasters

48

Identify Your Contacts and StakeholdersCyber attack leading to data breach

Page 49: Avoiding Data Center Disasters

49

Have Virtual Crisis Rooms Available

+ You need at least 2 conference bridges:• Your IT experts• Senior Management

+ Provide collaboration tools (Lync, Skype for Business, Slack…)

+ How will you be contacting Your IT experts? • Manually• From your ticketing system• From your Monitoring tools

Page 50: Avoiding Data Center Disasters

50

What You Need To Do:ü Definition of Major Incidentü Assess the Business Impactü Create Major Incident Response Team ü Define Communication Strategyü Have 2 Virtual Crisis Rooms

Page 51: Avoiding Data Center Disasters

51

All-inclusive IT Communications solutionü Unlimited Global Voice & SMSü Intelligent Notification Templatesü On-call Schedules and Rotationsü Automatic Escalationü 1-click Conference Callü Integration with Ticketing Systemsü 99.99% true uptimeü Open APIs

Page 52: Avoiding Data Center Disasters

52

Questions

@ITAlerting #ITAWebinar

Page 53: Avoiding Data Center Disasters

53

Thank you!

www.ITAlerting.com