myths and realities about designing high availability data centers

Post on 07-Aug-2015

257 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Myths and realities about designing high availability data centers

Tier III and Tier IV: What do you need to know?

Steven Shapiro, P.E., ATD

Mission Critical Practice Lead

2

Data Center World – Certified Vendor Neutral

Each presenter is required to certify that their presentation will be vendor-neutral.

As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.

3

Agenda

• Tier definitions

• Nines

• Tier III/IV issues – one line diagram

• Factors affecting performance

• Reliability and availability

• Causes of critical failures

• Key takeaways

• Questions

4

Tier Definitions

5

Things that are not tier-dependent

• Site location

• Facility construction

• Quality of equipment

• Facility commissioning

• Age of site

• Operations and maintenance program

• Personnel training

• Level of personnel coverage

Tier Definitions

6

• Align business mission and facility performance expectation

• Benchmark against the industry

• Assist in developing business case for capital expenditures

Tier Requirements

User must define tier requirements for a facility

7

Five 9’s Refers To Availability

• Availability (A) is the long-term average percentage of time that a component or system is in service and satisfactorily performing its intended function.

• Five nines availability means:

Minutes of Downtime Each Year

Hours of Downtime Every 20 Years

• Availability does not specify how often an outage occurs

“Nines”

8

Tier Requirements

Tier I Tier II Tier III Tier IV

Number of Delivery Paths 1 11 Active

1 Passive2 Active

Redundancy N N+1 N+1 2N Minimum

Compartmentalization No No No Yes

Concurrent Maintainability No No Yes Yes

Fault Tolerance No No No Yes

Availability 99.671 99.749 99.982 99.95

Downtime in Hr/Yr 28.8 22 1.6 0.4

9

• Tier I: $10,000 US/kW of useable UPS Power Output

• Tier II: $11,000 US/kW of useable UPS Power Output

• Tier III: $20,000 US/kW of useable UPS Power Output

• Tier IV: $22,000 US/kW of useable UPS Power Output

• Plus $225 US/SF of computer room

Based on a 15,000 SF white space, +/- 30%

Data Center Costs

From The Uptime Institute

10

One Line Diagram2N Utility

N+2 Gens

2N Gen Distribution

2N UPS

2NDistribution

Mechanical UPS

One Line Diagram

11

2N Utility

Not a tier requirement

12

Generator Count and Distribution

• 2N generators not a tier requirement

• Some sort of 2N distribution is a Tier III and IV requirement

13

• UPS can be configured in

many ways

• N = number of modules

installed meets the load – Tier

I And II

• N+1 = number of modules to

meet the load plus 1 additional

module, Tier III

Multi-Module UPS System Configuration

14

• UPS can be configured in many

ways

• 2N Systems = 2X the number of

systems than required to meet

the load – Tier IV

• 2(N+1) Systems = 2x the

number of N+1 systems installed

than required to meet the load –

Tier IV

Multi-Module UPS System Configuration

15

UPS Systems With External Maintenance Bypass

16

• Mechanical UPS is required to keep

data center HVAC systems

operational until generator plant

supports load

• May run CRAC units, secondary or

primary pumps, etc.

• Sized to match cooling load for data

center and battery time of data center

UPS

Mechanical UPS

17

Certain things can

be overdone.

How Much Redundancy is Enough?How Much Redundancy Is Enough?

18

The Cost of Reliability

99.0

.9

99.9

99.99

99.999

Reliability

99.9999

Cost $

19

• Location

• Design

• Redundancy level

• Construction

• Quality of equipment

• Thoroughness of commissioning program

• Age

• Operations & maintenance program

• Personnel training

• Level of coverage

Factors Affecting Performance But Not Tier Level

Lurking vulnerabilities

20

• Document Management

• Maintenance Programs (CMMS)

• Commissioning

• Vendor Management

• Change Management

• Standard and Emergency Operating Procedures

• Training

• Staffing

Factors Affecting Performance But Not Tier Level

21

• Harmonics Analysis

• EMF Studies

• Short Circuit Studies

• Coordination Studies

• CFD Modeling

Cold Aisle

Hot Aisle

IT Equipment

Computer Room Air ConditioningUnits

Factors Affecting Performance But Not Tier Level

22

• Probability of failure/reliability

• Availability

• MTTF

• MTTR

• Susceptibility to natural disasters

• Fault tolerance

• Single points of failure

• Maintainability

• Operational readiness

• Maintenance program

Reliability Considerations

23

Single Utility Feeder, Parallel Redundant UPS and Generators, Single-Corded IT Rack

24

2N UPS, N+1 Generators, ASTSs and Dual-Corded IT Rack

25

Two Utility Feeders, 2(N+1) UPS, 2(N+1) Generators, ASTSs, Dual Corded IT Rack

26

Distributed Redundant UPS, N+2 Generators, Two Utility Feeders, ASTSs and Dual Corded IT Rack

27

Reliability Considerations

28

• 2(N+1) / system + system with dual utility feeders is the most

reliable topology

• There is no significant reliability improvement in using a 2(N+1)

UPS configuration over 2N

• Distributed redundant configuration is less reliable than 2N

• Improvement if a second utility feeder is provided

• N+2 and/or 2N generator systems are marginally more reliable

than N+1

Reliability Considerations

29

Fail after 24 hours

Reliability Considerations

Study Performed by Idaho National Engineering Laboratory – February 1996 at Nuclear Power Plants

Emergency Diesel Generators

Fail to start

Fail after ½ hour

Fail after 8 hours

30

• A hybrid configuration may be most effective

• STS’s on the secondary side of the PDU transformer yield a 2-to-1

reliability improvement over 480 V STS’s

• Dual cord has higher impact than the use of STS’s

• Ultimate reliability: STS + Dual Cord

• Assess the condition of the mechanical plant in conjunction with the

electrical system

• The facility reliability will be driven by the least reliable component

(typically the electrical infrastructure)

Reliability Considerations

31

• Segregate system in independent blocks

• Eliminate common source components to minimize fault

propagation (i.e., LBS, hot-tie, manual bus ties)

• Move single points of failures as close to the load as possible

• Always maintain two independent sources of power to the critical

load

• Optimize the design of monitoring and controls circuits

• Keep it simple and minimize human intervention

Fundamentals of High Availability Design

32

Causes of Critical Failures

28%

20%

18%

13%

10%

4%4% 3%

Equipment failure

System design

Human error

Equipment design

Installation error

Commissioning or test deficiency

Maintenance oversight

Natural disaster

33

• Typically a combination of factors

• External event (power failure)

• Equipment failure

• Human factor

• Latent failures

• Root cause not always easy to ascertain

• Most major failures occur during change of state events

• Loss of utilities

• System transfers during maintenance activities

• More maintenance does not necessarily mean higher availability

Causes of Critical Failures

34

• What reliability level do you really need based on your business case?

• Do you want concurrent maintainability?

• Do you want fault tolerance?

• Minimize single points of failure within systems

• Ensure adequacy of operations, maintenance and testing programs

• Review/develop SOPS and EOPS

• Review/develop existing documentation

• Review/develop training practices

Key Takeaways

35

Steven Shapiro, PE, ATDMission Critical Practice Lead

(914) 420-3213

sshapiro@morrisonhershfield.com

http://www.linkedin.com/in/stevenshapirope

Twitter: @stevenshapirope

Questions?

References:Uptime Institute White Paper: Tier Myths and MisconceptionsUptime Institute White Paper: Data Center Site Infrastructure Tier Standard - Topology

top related