avoiding data center disasters: what professionals need to ...avoiding data center disasters: what...
TRANSCRIPT
Avoiding Data Center Disasters: What Professionals
Need To KnowPresented by Jim Nelson
Business Continuity Services, Inc.
Jim Nelson, M.S., MBCP, CDCP, CORP,President, Business Continuity Services, [email protected]
Chairman of the Board, The International Consortium for
Organizational [email protected]
Avoiding Data Center Disasters
Complexity & RisksMain Causes of DowntimeSome Warning Signs/ IndicatorsQuick Fixes/ Interventions/ Mitigations
DRJ Spring 2009 3© 2009 BCS 3
The Complexity of Mission Critical Sites
DRJ Spring 2009 4© 2009 BCS©2009 ICOR ALL RIGHTS RESERVED
*FireV d li
Equipment Failure*Heat
Human OriginNatural Origin
Risk Factors for Data Centers
Vandalism
*Water Flooding
Sabotage
Terrorism
Electro Magnetic Fields
*Lightning
Unintentional Human Errors
*Earthquake
Air Pollution and Contamination
*Heat*Cold
Network t ti
DRJ Spring 2009 5© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED
Electro Magnetic Fields
Power Failure Virus
saturation
Hackers
Data Network Origin*Unexpected Catastrophic events, normally impossible to predict.
Location EvaluationPotential natural hazards
Lighting FloodFloodTyphoonForest firesSeismic activity
Potential man-made hazards
Flight path
DRJ Spring 2009 6© 2009 BCS
©2009 ICOR ALL RIGHTS RESERVED- Used with Permission
Flight path Tunnels, lakesTrain/airportRF towersPower distribution networkIndustrial pollution
=> Convenience should not overrule security!
Proximity Evaluation
Proximity to Emergency ServicesP i it t i hb h dProximity to neighborhood (urban/industrial)Proximity to public transport and public roadsProximity to “high risk” targets
E b i
DRJ Spring 2009 7© 2009 BCS
©2009 ICOR ALL RIGHTS RESERVED- Used with Permission
EmbassiesGovernment buildingsPower stationsRadio/TV stations
=> Convenience should not overrule security!
Proximity Evaluation
Power Plant
Tank Farm
Tank Farm
Tank Farm
DRJ Spring 2009 8© 2009 BCS 8
Building EvaluationsRent/Buy/Built?
History of the building (flood, fire etc.)Building codesLevel/Floor within the building => Budget BudgetLevel/Floor within the buildingSpace required for each functional area including potential expansionSecurity (windows, perimeter etc.)Floor loadingSlab to Slab heightPower capabilities
Redundancy and capacityNetwork capabilities
=> Budget, Budget, Budget (build/run)
DRJ Spring 2009 9© 2009 BCS
©2009 ICOR ALL RIGHTS RESERVED-Used with Permission
Network capabilitiesRedundancy and diversity
External supply capabilitiesRe-fueling of standby generator tanksDelivery of heavy/big equipment and route to the data center
=> Budget & Budget (built/run)
Main Causes of Downtime
Human error and hardware/system failure are the main causes for downtime
© 2009 BCS 10
©2009 ICOR ALL RIGHTS RESERVED- Used with Permission
Causes of downtime and data loss(Source: ZDNet by ADIC)
Predominant Causes of DC Failures
Human ErrorNo or poorly executed
d k
Electro Magnetic Fields (EMF)
processes and work instructionsUnauthorized accessAccidentsUnnoticed Alarms
Power Quality issuesP lt / t /
High radiation levels from power cables / UPS / Transformers / PDU / Lighting etc.
Environmental Conditions
DRJ Spring 2009 11© 2009 BCS
©2009 ICOR ALL RIGHTS RESERVED- Used with Permission
Poor voltage / current / frequency regulationHigh level of Common and Normal mode noiseHigh ground resistanceHarmonics
Temperature / HumidityWrong cooling principlesHigh air contamination
Warning Signs
Data Center Location and DesignBuilding CodesBuilding CodesCRAC/ CRAH Units on the raised floor
Access/ Cleanliness/ Vibration/ EMF/UPSFire ProtectionCeiling/ Raised Floor/ Walls
DRJ Spring 2009 12© 2009 BCS 12
Ceiling/ Raised Floor/ WallsData Center Access
Packing Materials/ Service Technicians/ Vendors
Warning Signs
Staff in Data CenterToursToursEquipment / Machine Room
Room LayoutShort/long rack rows / Open spaceElectrical Panels / Transformers / Glass
Cleanliness
DRJ Spring 2009 13© 2009 BCS
CleanlinessPrinters / Dirt / Zinc Whiskers / Dust
Emergency Deliveries
13
Cabling Standards?
DRJ Spring 2009 14© 2009 BCS 14
Cabling Standards?
DRJ Spring 2009 15© 2009 BCS 15
Use Structured Cabling Structured cabling provides a neat way of cabling
Reduce risk of downtimeReduce risk of downtimeEasy re-patchingEasy fault finding
DRJ Spring 2009 16© 2009 BCS
Unpacking in the Data Center
DRJ Spring 2009 17© 2009 BCS 17July 11, 2005
To reach I iti t
Heat SourceNeed approx 16%O i t i
Oxygen Source
The Fire Triangle
Physical State of Fuel
Ignition temp:Open flameSpark/Arc
Chemical Action
Open air contains21% Oxygen
DRJ Spring 2009 18© 2009 BCS
Natural gasHydrogenCarbon-
Monoxide
GasesGasolineAlcohol
PaintVarnishLacquer
LiquidsWoodPaperCloth
Plastic Wax
Solids
Data Center FiresResearch by Swiss-RE
80% of the data centers affected b fi l t d t fiby fires are related to fires outside the data centerFire will affect 0.5% of the data centersWater will affect 2% of the data centers
Typically data centers are not
DRJ Spring 2009 19© 2009 BCS
Typically data centers are not well protected due to misunderstanding on fire ratings or non-effective fire suppression systems
A Few Dangers
Water Steam (humidity): Emitted by wet materials walls (gypsum and brickEmitted by wet materials walls (gypsum and brick walls RF120)> 4.16 Liter / 1.1 gallons of water / ft3 of concrete wall (150 liter / m3) (approx. 15%)
A typical room fire at 1100 ºC / 2000 ºF on the other side of a concrete wall will result in about 100 - 300 ºC / 212 - 527 ºF in the data centerEnvironmental Contamination (human hazard):
DRJ Spring 2009 20© 2009 BCS
Environmental Contamination (human hazard):1kg / 2 Lbs PVC burned at 300 ºC / 572ºF causes 5,800 m³ / 205.000 ft³ of hydrochloric acid gas (equals approx 2,000 sq/meter / 20,000 sq.ft data center)
Buttons, Buttons, Butttons
DRJ Spring 2009 21© 2009 BCS 21
The Dangers of FireHuman health
Direct contact with flamesDirect contact with flamesSmoke inhalation (1kg of burned plastic can pollute a 2,000 sq/meter / 20,000 sq/ft data center)
Equipment damageDirect contact with flames
DRJ Spring 2009 22© 2009 BCS
Burning plastic components combine with moisture in the air to create an acidic vapor that can damage other equipment away from the flames
Detection Systems
VESDAV E l S k D t ti S tVery Early Smoke Detection System
HSSDHighly Sensitive Smoke Detection
VIEWVery Intelligent Early Warning
DRJ Spring 2009 23© 2009 BCS
Very Intelligent Early WarningSmoke sensors
Traditional sensors
Typical Choices for Fire Suppression
Water based systemsSprinkler systemsWater mist
Halocarbon- (Heat Removal)HFC-227ea; FM200 / HFC-227FK-5-1-12; Novec-1230HFC-125; Ecaro-25; FE-13
DRJ Spring 2009 24© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 3010.24
HFC 125; Ecaro 25; FE 13 Inert gases- (Oxygen Reduction)
Inergen-451Carbon Dioxide (CO2)IG-55; Argonite
Comparison
Agent Concentration NOAEL Safety margin ODP GWP ALT
FM200 4.5 – 8.7% 9% 3 – 20% 0 3,500 33 yr
Novec 1230 4 – 6% 10% 67 – 150% 0 1 5 d
FE-13 12 – 18% 30% 75 – 85% 0 14,800 243 yr
FE-25 8 – 11.5% 7.5% 0% 0 3,400 32.6 yr
Inergen 38 – 43% 43% 0 – 13% 0 Not rated
Not rated
Argonite 38 – 43% 43% 0 – 13% 0 Not Not
©2009 BCS 25
NOAEL: No Adverse Affect Level for cardiac sensitizationODP: Ozone Depletion PotentialGWP: Global Warming PotentialALT: Atmospheric Life Time
go 38 3% 3% 0 3% 0 orated
orated
NOTE!!: Numbers vary per research study
Quick Fix Options
Keep a clean facilityE H d H ld Fi E ti i hEnsure Hand-Held Fire Extinguishers are correct type and chargedTrain anyone with access to DC at least annually on useEscort requiredEscort requiredDefine what they should do and not doContinuous Monitoring and Immediate response
DRJ Spring 2009 26© 2009 BCS
Thermo Graphic Scanning
DRJ Spring 2009 27© 2009 BCS
Electromagnetic Spectrum
DRJ Spring 2009 28© 2009 BCS
Electric vs. Magnetic Fields
DRJ Spring 2009 29© 2009 BCS
What causes EMF / EMI ?Current Carrying Conductors
High tensionHigh tension TransformersUPSGen SetPower cables
Lightning strikesWireless communications
DRJ Spring 2009 30© 2009 BCS
Wireless communicationsWifi, canopy, bluetoothRadarTransmitting towers (TV/Radio/GSM)
EMF Effects on Integrated Circuits
Antenna effect
Detail of a damaged strip Detail of a damaged strip in an integrated circuit
DRJ Spring 2009 31© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 2010.31
Electro MigrationMicroscopic detail of a
broken strip
Hot Electron effectTransistors Random
malfunction
Don’t See it, Don’t Smell it….But it is There
Increasing power consumption in data centers require high current power cables
5000 Amp. Cable
high current power cablesLimited by Norm IEC/EN 61000-4-8 only up to 37.5 mG!Limit by ‘National Council of Radiation and Measurements’ for human safety up to 10mG!!
DRJ Spring 2009 32© 2009 BCS
1 m
2 m
3 m
1730 mG
432 mG
192 mG
Quick Fix Options
Take readings and monitorK C C i ti C blKeep Copper Communication Cables away from power cables, motors, transformersKeep good distanceHandheld devices away from EDPHandheld devices away from EDP equipment
DRJ Spring 2009 33© 2009 BCS
EH&S Definitions for NoiseNoise properties
Frequency contentAmplitudeAmplitudeTemporal variationsdirectionality
Source Emits the noiseMeasured as “sound power level”
PathThe area throughout which the emitted noise is transmitted. Includes reflections, absorptions, collisions with other noise sources and
DRJ Spring 2009 34© 2009 BCS
, p ,might be scattered
ReceiverImmission (eye-mission), reverse of emission Typically human beings (or test equipment i.e. microphone)Measured as “sound pressure level”
General Thresholds for Humans
Threshold of pain:134 dB(A)Hearing damage during short exposure:120 dB(A)Hearing damage during long exposure: 90 dB(A)Normal talking at 1 meter (3
DRJ Spring 2009 35© 2009 BCS
Normal talking at 1 meter (3 ft) distance: 20–40 dB(A)Jackhammer/disco: approx. 100 dB(A)
Effects of (Excessive) NoiseAuditory effect (physiological)
Hearing damageg gNon-auditory effect
Annoyance headacheStressConcentration loss
DRJ Spring 2009 36© 2009 BCS
Concentration lossDiminished productivityInterference with effective communicationLow job satisfaction and motivation
Exposure Limitations
Two main regulatory bodiesOSHA (USA)OSHA (USA)EU Regulations (Europe and Asia)
OSHA auditory limit set at 90dB(A) over 8hrEU auditory limit set at 87dB(A) over 8hr
Noise levels of 80-85dB(A) requires employers to start taking precautions
N i l
DRJ Spring 2009 37© 2009 BCS
Noise removalProtection devices for employees
Limit for non-auditory set at 55 – 70dB(A)
Quick Fix Options
Can lead to errors and mistakesProvide for breaks less time in areaProvide for breaks, less time in areaReduce the noise source
Air conditioner in different roomReduce the path
Installation of acoustic absorbing materialProtect the receiver
DRJ Spring 2009 38© 2009 BCS
Protective wear, i.e., earplugs
Contamination CategoriesGases
Corrosive compoundsCorrosive compoundsVolatile organics
Solids (particulates)AbrasiveHygroscopicCorrosive
DRJ Spring 2009 39© 2009 BCS
ConductiveLiquids
Direct waterVaporized water (mist, humidity from sea)
GasesCorrosive gases such as chlorine, nitrogen dioxide, sulfur dioxide, etc., can cause corrosionCorrosion is electrochemical reaction which often needs some level of humidityCorrosion leads to
Electrical bridging / short circuitsIntermittent electrical leakageOpen circuits (extreme cases)
DRJ Spring 2009 40© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 3010.40
Solids (Particulates)
Come from various sourcesDustustHumans (we generate about 100,000 particles per cubic foot per minute)Metallic whiskers
Typical characteristics leading to reliability problems in IT equipment
Hygroscopic (absorption of water molecules)
DRJ Spring 2009 41© 2009 BCS
Abrasive (“scratching”) CorrosiveConductiveThermal in-efficiencies
Solids: Area of ImpactAir intake / filters of IT equipment and air-conditioningconditioning
Less thermal flow / in-efficiencies
Internal heat sinks and components
Less thermal flow / overheating
DRJ Spring 2009 42© 2009 BCS
gElectrical connectorsRemovable media
Optical drivesMagnetic read/write heads
Solids: Impact
Ensure proper cleaning of ductsDucts / vents collect a lot of dust which mostDucts / vents collect a lot of dust which most of the time is not in clear view
DRJ Spring 2009 43© 2009 BCS ©2009 ICOR ALL RIGHTS RESERVED TIFM 3010.43
Quick Fix OptionsDo not allow packing material to go inside the data centerN f d d i k tNo food, drinks, etc.Don’t leave doors openNo runningEnsure contractors clean up after their workWhen construction work is required ensure proper screen protection
DRJ Spring 2009 44© 2009 BCS
screen protectionUse sticky anti static doormat
Cleaning
DRJ Spring 2009 45© 2009 BCS
Quick Fix Options
Policies & ProceduresCh C t l Pl f h t l tChange Control: Plan for what you plan to do, Implement, Track affects, Back out PlanTraining, Training, TrainingDocumentation
DRJ Spring 2009 46© 2009 BCS
DocumentationSeparation by FunctionConduct Computational Fluid Dynamics (CFD) studies
46July 11, 2005
What do these jobs have in common?
DRJ Spring 2009 47© 2009 BCS
They need to be Certified Professionals
Certification Required?
What certificationWhat certification do you ask him for when you let him run your mission
critical data center?
DRJ Spring 2009 48© 2009 BCS
center?
Room Construction Modeling
DRJ Spring 2009 49© 2009 BCS
Cabinet Modeling
DRJ Spring 2009 50© 2009 BCS