avoiding data center disasters: what professionals need to ...avoiding data center disasters: what...

25
Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services, Inc. Jim Nelson, M.S., MBCP, CDCP , CORP President, Business Continuity Services, Inc. [email protected] Chairman of the Board, The International Consortium for Organizational Resilience [email protected]

Upload: others

Post on 18-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Avoiding Data Center Disasters: What Professionals

Need To KnowPresented by Jim Nelson

Business Continuity Services, Inc.

Jim Nelson, M.S., MBCP, CDCP, CORP,President, Business Continuity Services, [email protected]

Chairman of the Board, The International Consortium for

Organizational [email protected]

Page 2: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Avoiding Data Center Disasters

Complexity & RisksMain Causes of DowntimeSome Warning Signs/ IndicatorsQuick Fixes/ Interventions/ Mitigations

DRJ Spring 2009 3© 2009 BCS 3

The Complexity of Mission Critical Sites

DRJ Spring 2009 4© 2009 BCS©2009 ICOR ALL RIGHTS RESERVED

Page 3: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

*FireV d li

Equipment Failure*Heat

Human OriginNatural Origin

Risk Factors for Data Centers

Vandalism

*Water Flooding

Sabotage

Terrorism

Electro Magnetic Fields

*Lightning

Unintentional Human Errors

*Earthquake

Air Pollution and Contamination

*Heat*Cold

Network t ti

DRJ Spring 2009 5© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED

Electro Magnetic Fields

Power Failure Virus

saturation

Hackers

Data Network Origin*Unexpected Catastrophic events, normally impossible to predict.

Location EvaluationPotential natural hazards

Lighting FloodFloodTyphoonForest firesSeismic activity

Potential man-made hazards

Flight path

DRJ Spring 2009 6© 2009 BCS

©2009 ICOR ALL RIGHTS RESERVED- Used with Permission

Flight path Tunnels, lakesTrain/airportRF towersPower distribution networkIndustrial pollution

=> Convenience should not overrule security!

Page 4: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Proximity Evaluation

Proximity to Emergency ServicesP i it t i hb h dProximity to neighborhood (urban/industrial)Proximity to public transport and public roadsProximity to “high risk” targets

E b i

DRJ Spring 2009 7© 2009 BCS

©2009 ICOR ALL RIGHTS RESERVED- Used with Permission

EmbassiesGovernment buildingsPower stationsRadio/TV stations

=> Convenience should not overrule security!

Proximity Evaluation

Power Plant

Tank Farm

Tank Farm

Tank Farm

DRJ Spring 2009 8© 2009 BCS 8

Page 5: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Building EvaluationsRent/Buy/Built?

History of the building (flood, fire etc.)Building codesLevel/Floor within the building => Budget BudgetLevel/Floor within the buildingSpace required for each functional area including potential expansionSecurity (windows, perimeter etc.)Floor loadingSlab to Slab heightPower capabilities

Redundancy and capacityNetwork capabilities

=> Budget, Budget, Budget (build/run)

DRJ Spring 2009 9© 2009 BCS

©2009 ICOR ALL RIGHTS RESERVED-Used with Permission

Network capabilitiesRedundancy and diversity

External supply capabilitiesRe-fueling of standby generator tanksDelivery of heavy/big equipment and route to the data center

=> Budget & Budget (built/run)

Main Causes of Downtime

Human error and hardware/system failure are the main causes for downtime

© 2009 BCS 10

©2009 ICOR ALL RIGHTS RESERVED- Used with Permission

Causes of downtime and data loss(Source: ZDNet by ADIC)

Page 6: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Predominant Causes of DC Failures

Human ErrorNo or poorly executed

d k

Electro Magnetic Fields (EMF)

processes and work instructionsUnauthorized accessAccidentsUnnoticed Alarms

Power Quality issuesP lt / t /

High radiation levels from power cables / UPS / Transformers / PDU / Lighting etc.

Environmental Conditions

DRJ Spring 2009 11© 2009 BCS

©2009 ICOR ALL RIGHTS RESERVED- Used with Permission

Poor voltage / current / frequency regulationHigh level of Common and Normal mode noiseHigh ground resistanceHarmonics

Temperature / HumidityWrong cooling principlesHigh air contamination

Warning Signs

Data Center Location and DesignBuilding CodesBuilding CodesCRAC/ CRAH Units on the raised floor

Access/ Cleanliness/ Vibration/ EMF/UPSFire ProtectionCeiling/ Raised Floor/ Walls

DRJ Spring 2009 12© 2009 BCS 12

Ceiling/ Raised Floor/ WallsData Center Access

Packing Materials/ Service Technicians/ Vendors

Page 7: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Warning Signs

Staff in Data CenterToursToursEquipment / Machine Room

Room LayoutShort/long rack rows / Open spaceElectrical Panels / Transformers / Glass

Cleanliness

DRJ Spring 2009 13© 2009 BCS

CleanlinessPrinters / Dirt / Zinc Whiskers / Dust

Emergency Deliveries

13

Cabling Standards?

DRJ Spring 2009 14© 2009 BCS 14

Page 8: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Cabling Standards?

DRJ Spring 2009 15© 2009 BCS 15

Use Structured Cabling Structured cabling provides a neat way of cabling

Reduce risk of downtimeReduce risk of downtimeEasy re-patchingEasy fault finding

DRJ Spring 2009 16© 2009 BCS

Page 9: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Unpacking in the Data Center

DRJ Spring 2009 17© 2009 BCS 17July 11, 2005

To reach I iti t

Heat SourceNeed approx 16%O i t i

Oxygen Source

The Fire Triangle

Physical State of Fuel

Ignition temp:Open flameSpark/Arc

Chemical Action

Open air contains21% Oxygen

DRJ Spring 2009 18© 2009 BCS

Natural gasHydrogenCarbon-

Monoxide

GasesGasolineAlcohol

PaintVarnishLacquer

LiquidsWoodPaperCloth

Plastic Wax

Solids

Page 10: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Data Center FiresResearch by Swiss-RE

80% of the data centers affected b fi l t d t fiby fires are related to fires outside the data centerFire will affect 0.5% of the data centersWater will affect 2% of the data centers

Typically data centers are not

DRJ Spring 2009 19© 2009 BCS

Typically data centers are not well protected due to misunderstanding on fire ratings or non-effective fire suppression systems

A Few Dangers

Water Steam (humidity): Emitted by wet materials walls (gypsum and brickEmitted by wet materials walls (gypsum and brick walls RF120)> 4.16 Liter / 1.1 gallons of water / ft3 of concrete wall (150 liter / m3) (approx. 15%)

A typical room fire at 1100 ºC / 2000 ºF on the other side of a concrete wall will result in about 100 - 300 ºC / 212 - 527 ºF in the data centerEnvironmental Contamination (human hazard):

DRJ Spring 2009 20© 2009 BCS

Environmental Contamination (human hazard):1kg / 2 Lbs PVC burned at 300 ºC / 572ºF causes 5,800 m³ / 205.000 ft³ of hydrochloric acid gas (equals approx 2,000 sq/meter / 20,000 sq.ft data center)

Page 11: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Buttons, Buttons, Butttons

DRJ Spring 2009 21© 2009 BCS 21

The Dangers of FireHuman health

Direct contact with flamesDirect contact with flamesSmoke inhalation (1kg of burned plastic can pollute a 2,000 sq/meter / 20,000 sq/ft data center)

Equipment damageDirect contact with flames

DRJ Spring 2009 22© 2009 BCS

Burning plastic components combine with moisture in the air to create an acidic vapor that can damage other equipment away from the flames

Page 12: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Detection Systems

VESDAV E l S k D t ti S tVery Early Smoke Detection System

HSSDHighly Sensitive Smoke Detection

VIEWVery Intelligent Early Warning

DRJ Spring 2009 23© 2009 BCS

Very Intelligent Early WarningSmoke sensors

Traditional sensors

Typical Choices for Fire Suppression

Water based systemsSprinkler systemsWater mist

Halocarbon- (Heat Removal)HFC-227ea; FM200 / HFC-227FK-5-1-12; Novec-1230HFC-125; Ecaro-25; FE-13

DRJ Spring 2009 24© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 3010.24

HFC 125; Ecaro 25; FE 13 Inert gases- (Oxygen Reduction)

Inergen-451Carbon Dioxide (CO2)IG-55; Argonite

Page 13: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Comparison

Agent Concentration NOAEL Safety margin ODP GWP ALT

FM200 4.5 – 8.7% 9% 3 – 20% 0 3,500 33 yr

Novec 1230 4 – 6% 10% 67 – 150% 0 1 5 d

FE-13 12 – 18% 30% 75 – 85% 0 14,800 243 yr

FE-25 8 – 11.5% 7.5% 0% 0 3,400 32.6 yr

Inergen 38 – 43% 43% 0 – 13% 0 Not rated

Not rated

Argonite 38 – 43% 43% 0 – 13% 0 Not Not

©2009 BCS 25

NOAEL: No Adverse Affect Level for cardiac sensitizationODP: Ozone Depletion PotentialGWP: Global Warming PotentialALT: Atmospheric Life Time

go 38 3% 3% 0 3% 0 orated

orated

NOTE!!: Numbers vary per research study

Quick Fix Options

Keep a clean facilityE H d H ld Fi E ti i hEnsure Hand-Held Fire Extinguishers are correct type and chargedTrain anyone with access to DC at least annually on useEscort requiredEscort requiredDefine what they should do and not doContinuous Monitoring and Immediate response

DRJ Spring 2009 26© 2009 BCS

Page 14: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Thermo Graphic Scanning

DRJ Spring 2009 27© 2009 BCS

Electromagnetic Spectrum

DRJ Spring 2009 28© 2009 BCS

Page 15: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Electric vs. Magnetic Fields

DRJ Spring 2009 29© 2009 BCS

What causes EMF / EMI ?Current Carrying Conductors

High tensionHigh tension TransformersUPSGen SetPower cables

Lightning strikesWireless communications

DRJ Spring 2009 30© 2009 BCS

Wireless communicationsWifi, canopy, bluetoothRadarTransmitting towers (TV/Radio/GSM)

Page 16: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

EMF Effects on Integrated Circuits

Antenna effect

Detail of a damaged strip Detail of a damaged strip in an integrated circuit

DRJ Spring 2009 31© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 2010.31

Electro MigrationMicroscopic detail of a

broken strip

Hot Electron effectTransistors Random

malfunction

Don’t See it, Don’t Smell it….But it is There

Increasing power consumption in data centers require high current power cables

5000 Amp. Cable

high current power cablesLimited by Norm IEC/EN 61000-4-8 only up to 37.5 mG!Limit by ‘National Council of Radiation and Measurements’ for human safety up to 10mG!!

DRJ Spring 2009 32© 2009 BCS

1 m

2 m

3 m

1730 mG

432 mG

192 mG

Page 17: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Quick Fix Options

Take readings and monitorK C C i ti C blKeep Copper Communication Cables away from power cables, motors, transformersKeep good distanceHandheld devices away from EDPHandheld devices away from EDP equipment

DRJ Spring 2009 33© 2009 BCS

EH&S Definitions for NoiseNoise properties

Frequency contentAmplitudeAmplitudeTemporal variationsdirectionality

Source Emits the noiseMeasured as “sound power level”

PathThe area throughout which the emitted noise is transmitted. Includes reflections, absorptions, collisions with other noise sources and

DRJ Spring 2009 34© 2009 BCS

, p ,might be scattered

ReceiverImmission (eye-mission), reverse of emission Typically human beings (or test equipment i.e. microphone)Measured as “sound pressure level”

Page 18: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

General Thresholds for Humans

Threshold of pain:134 dB(A)Hearing damage during short exposure:120 dB(A)Hearing damage during long exposure: 90 dB(A)Normal talking at 1 meter (3

DRJ Spring 2009 35© 2009 BCS

Normal talking at 1 meter (3 ft) distance: 20–40 dB(A)Jackhammer/disco: approx. 100 dB(A)

Effects of (Excessive) NoiseAuditory effect (physiological)

Hearing damageg gNon-auditory effect

Annoyance headacheStressConcentration loss

DRJ Spring 2009 36© 2009 BCS

Concentration lossDiminished productivityInterference with effective communicationLow job satisfaction and motivation

Page 19: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Exposure Limitations

Two main regulatory bodiesOSHA (USA)OSHA (USA)EU Regulations (Europe and Asia)

OSHA auditory limit set at 90dB(A) over 8hrEU auditory limit set at 87dB(A) over 8hr

Noise levels of 80-85dB(A) requires employers to start taking precautions

N i l

DRJ Spring 2009 37© 2009 BCS

Noise removalProtection devices for employees

Limit for non-auditory set at 55 – 70dB(A)

Quick Fix Options

Can lead to errors and mistakesProvide for breaks less time in areaProvide for breaks, less time in areaReduce the noise source

Air conditioner in different roomReduce the path

Installation of acoustic absorbing materialProtect the receiver

DRJ Spring 2009 38© 2009 BCS

Protective wear, i.e., earplugs

Page 20: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Contamination CategoriesGases

Corrosive compoundsCorrosive compoundsVolatile organics

Solids (particulates)AbrasiveHygroscopicCorrosive

DRJ Spring 2009 39© 2009 BCS

ConductiveLiquids

Direct waterVaporized water (mist, humidity from sea)

GasesCorrosive gases such as chlorine, nitrogen dioxide, sulfur dioxide, etc., can cause corrosionCorrosion is electrochemical reaction which often needs some level of humidityCorrosion leads to

Electrical bridging / short circuitsIntermittent electrical leakageOpen circuits (extreme cases)

DRJ Spring 2009 40© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 3010.40

Page 21: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Solids (Particulates)

Come from various sourcesDustustHumans (we generate about 100,000 particles per cubic foot per minute)Metallic whiskers

Typical characteristics leading to reliability problems in IT equipment

Hygroscopic (absorption of water molecules)

DRJ Spring 2009 41© 2009 BCS

Abrasive (“scratching”) CorrosiveConductiveThermal in-efficiencies

Solids: Area of ImpactAir intake / filters of IT equipment and air-conditioningconditioning

Less thermal flow / in-efficiencies

Internal heat sinks and components

Less thermal flow / overheating

DRJ Spring 2009 42© 2009 BCS

gElectrical connectorsRemovable media

Optical drivesMagnetic read/write heads

Page 22: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Solids: Impact

Ensure proper cleaning of ductsDucts / vents collect a lot of dust which mostDucts / vents collect a lot of dust which most of the time is not in clear view

DRJ Spring 2009 43© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 3010.43

Quick Fix OptionsDo not allow packing material to go inside the data centerN f d d i k tNo food, drinks, etc.Don’t leave doors openNo runningEnsure contractors clean up after their workWhen construction work is required ensure proper screen protection

DRJ Spring 2009 44© 2009 BCS

screen protectionUse sticky anti static doormat

Page 23: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Cleaning

DRJ Spring 2009 45© 2009 BCS

Quick Fix Options

Policies & ProceduresCh C t l Pl f h t l tChange Control: Plan for what you plan to do, Implement, Track affects, Back out PlanTraining, Training, TrainingDocumentation

DRJ Spring 2009 46© 2009 BCS

DocumentationSeparation by FunctionConduct Computational Fluid Dynamics (CFD) studies

46July 11, 2005

Page 24: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

What do these jobs have in common?

DRJ Spring 2009 47© 2009 BCS

They need to be Certified Professionals

Certification Required?

What certificationWhat certification do you ask him for when you let him run your mission

critical data center?

DRJ Spring 2009 48© 2009 BCS

center?

Page 25: Avoiding Data Center Disasters: What Professionals Need To ...Avoiding Data Center Disasters: What Professionals Need To Know Presented by Jim Nelson Business Continuity Services,

Room Construction Modeling

DRJ Spring 2009 49© 2009 BCS

Cabinet Modeling

DRJ Spring 2009 50© 2009 BCS