infrastructure services operational plan

14
ABC Infrastructure Services Operational Plan (FY15-FY19) Version 1..5 draft Steve Bertino, Director ITS Operations Information Technology Services (ITS) Rochester, NY 14623-5603 Information in this document is subject to change without notice. No part of the contents of this document may be reproduced or transmitted in any form, or by any means, electronic or mechanical, for any purpose, without the written permission of the Rochester Institute of Technology.

Upload: stephen-bertino

Post on 15-Apr-2017

45 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Infrastructure services operational plan

ABC Infrastructure Services Operational Plan (FY15-FY19)

Version 1..5 draft

Steve Bertino, Director ITS Operations Information Technology Services (ITS) Rochester, NY 14623-5603

Information in this document is subject to change without notice. No part of the contents of this document may be reproduced or transmitted in any form, or by any means,

electronic or mechanical, for any purpose, without the written permission of the Rochester Institute of Technology.

Page 2: Infrastructure services operational plan

Contents

Executive Summary:

Scope

Strategy - Guiding Principles:

Current State

Current Constraints:

Ross Hall DC:

Institute Hall DC:

Future Strategy: Effective 7-1-15 (FY16).

Network Impacts:

Systems/Storage/Backup (Data Protection) Impacts:

Environmental Impact (facilities, power management, cooling):

Data Center Operations Impact :

Costs and Sustainability:

Capacity Management:

2

Page 3: Infrastructure services operational plan

Critical Partnerships:

Risks to Successful Execution:

External risk:

Internal Risk:

Executive Summary: Infrastructure Services:

Executive Summary:

Efficient infrastructure services and smart networks provide a strategic advantage. They compose the basic services upon

which all other teaching, learning, research, administrative, and global outreach applications reside. To meet increasing and

ever changing demand, Infrastructure services access must be flexible, secure and available anywhere, anytime and with

enough capacity to provide for unplanned demand of user community.

Enterprise Capabilities are services built on IT infrastructure that are provided for the entire university. As the realm of IT

applications continues to expand and mature, the list of enterprise capabilities continues to grow. This strategy addresses

enterprise capability areas that allow institution community members to develop and share solutions that serve the needs

of specific units and colleges at large. We must provide services in a best practices manner for the community in all

infrastructure services areas both enterprise and research.

3

Page 4: Infrastructure services operational plan

With the operationalization of the Institute Hall data center in 2014 the legacy model for managing data center services

fundamentally changed with two data centers on campus and the addition of the data center capacity in NYSERNet

(Syracuse). This combined with a significant jump in affordable CLOUD services have expanded the infrastructure services

delivery models which ABC must consider and leverage.

Guiding principles are applied in the development and execution of this strategy. These principles are intended to ensure

we are always in alignment with institute strategies, industry technology and security drivers.

This comprehensive strategy will govern infrastructure design and data center facilities decisions for next 5 years, or until

strategy is revisited.

The premise of this strategy is;

● Primary services are located in the Institute Hall datacenter,

● Backup location in Ross Hall datacenter,

● Secondary backup location in NYSERNet-Syracuse(out of region).

● The architecture of the services will be such that Hybrid CLOUD and external services can be integrated with ABC

services into seamless IT solutions.

● The risk of over engineering will be managed to avoid additional cost and complexity

4

Page 5: Infrastructure services operational plan

Scope The scope of this document is to define the current and future state of data center designs and management at ABC. The infrastructure services that connect and reside within and outside the data centers are comprehended in scope as they are all in the service delivery chain.

Out-of- Scope:

Sponsored research efforts requiring in-house high performance compute(HPC) services is not accounted for in this plan.

Global computing beyond services and scale currently provide was not explicitly considered, but we believe could be supported based on flexible design principles applied.

Strategy - Guiding Principles: The following guiding principles will be used to help guide and shape the future strategy;

● Maximize use of the new Institute Hall Center ● As a technology centric institution we aim to provide visible, usable and modern technologies ● Provide secure service protection consistent with the institution’s need of such services ● Reduce cost of sustaining multiple data centers (fit for purpose - don’t over engineer) ● Flexible architecture allowing for Hybrid Cloud services where appropriate ● A loss of the entire Henrietta campus is equivalent to loss of ABC as entity and having robust off campus capability is

not warranted in this event ● Remain relatively cost competitive with outsourced data center services ● Exceed quality of service and value provided by outsourced data center services providers ● Make provisioning of data center services fast and flexible

5

Page 6: Infrastructure services operational plan

Current State ABC Data centers are located on campus at opposite sides of campus, one in Ross Hall and the other in Institute Hall. There is a 3 rd leased location in the NYSERNet (Syracuse) data center that is used to protect critical bootstrap, business critical, and some backup services off campus. The current design has IH as the primary datacenter and RH as backup. Some critical services, including Core services and Enterprise Cloud have services are protected between data centers, as well as co-location customers (some in both centers). The CLOUD design allows services to move between data centers without consideration for where back-end data storage resides. This creates the potential for an entire service residing in only one data center at any given time. The risk in event of failure of that service or DC is that service recovery would be elongated. This design also comes with a higher sustainability cost over time. In addition to this risk the system architecture is not in-synch with the network architecture which assumes a primary and secondary center design as opposed to an active-active design. Although the current network bandwidth can support this, future demand will require significant upgrades and cost to support. In the absence of an Institution level business continuity plan the DR plans for the datacenter services are constructed based on loose list of critical services as identified by Risk Management. The criteria for identifying was very subjective and there are many services which in practice are not mission critical to the institution. ITS has made some assumptions relative to our core services those being required to reconstitute services and communication public via web. These services are currently in the Syracuse NyserNet DC.

Current Constraints: ● Services moving between centers without consideration for serviceability or DR ● Cost of sustaining two tier-3 data centers going forward (Ross Hall investment) ● Co-location Clients over provisioning services in two data center

6

Page 7: Infrastructure services operational plan

Ross Hall DC:

This is the legacy data center that has been in operation for 20+ years. The data center standard as defined by TIA-942 requirements. This center is estimated at a tier- 3 , on a 1-4 scale with 4 being most robust. The tiers are defined as:

● Tier 1 = Non-redundant capacity components (single uplink and servers). ● Tier 2 = Tier 1 + Redundant capacity components. ● Tier 3 = Tier 1 + Tier 2 + Dual-powered equipment and multiple uplinks. ● Tier 4 = Tier 1 + Tier 2 + Tier 3 + all components are fully fault-tolerant including uplinks, storage, chillers, HVAC

systems, servers etc. Everything is dual-powered.

This data center is 1500 square feet with capacity for 48 racks. The UPS and AC Units are all nearing end of functional life and will require replacement over next 2-3 years if the data center remains in a primary role. This cost could be ~ $1M.

Institute Hall DC: This data center is part of a multipurpose facility with public access to non data center parts. The facility was built in 2013 and has some unique “green” engineering as part of the design. The primary “green” features are;

● Free-air cooling ● Equipment rated to run at much warmer temps (100 degrees) ● Air reclamation system so heat from equipment is reused in heating non dc areas of facility ● Fly-wheel technology for UPS vs battery(changing to combination of Flywheel and Battery in Fy17) ● Natural gas generator powered

It is 1600 square feet with a capacity for 32 racks. This facility estimated as tier-2.5, primarily due to facility construction decision to make a multipurpose facility. As such there are classroom and lab areas over the center that have water and drainage that runs through data center proper area. In addition a 2nd generator is required to make all the services including power redundant for IH.

7

Page 8: Infrastructure services operational plan

Current Datacenter DR Protection for Institute Hall(Primary DC): With Institute Hall acting as the primary datacenter the risk of losing this datacenter is of greatest concern for datacenter DR. The following

matrix represent services that would remain available in the event that IH was unavailable due to a long term outage (i.e. major facility

damage). All other services would have to be reconstituted in RH in a priority fashion based on a disaster recovery event plan.

Services Available w/loss of IH;

Network(wired and Wireless)

Internet

VPN

Radius Authentication

Guest access

Firewall

Voice & Voicemail

Future Strategy: Effective 7-1-15 (FY16-FY19). Given the current constraints and guiding principles the strategy identified is to have a primary data center in Institute Hall(tier-2) and a backup center in Ross hall(Tier-2.5). With NYSERNet (Tier-3) site used for running bootstrap services off campus and support of select data backups.

This strategy allows us to provide fit-for-purpose protection of services and data while keeping ongoing maintenance cost manageable. The design deployed in Ross Hall will allow for expansion in the event of unforeseen computing demand.

8

Page 9: Infrastructure services operational plan

Standardization in design and implementation decisions will need to be strictly adhered to in order not to deviate and end up with less than desired model by accident.

Network Impacts: No network changes are required to support the recommended strategy

Benefits over current state:

● A challenge that any DataCenter strategy must address, and one that often becomes most apparent from the perspective of the network, is the risk that community members will deploy solutions that are under-engineered in that they have not considered all components that are required to provide a specific level of service. The perception that the network is unlimited and/or free has often led it to be a neglected area where services are designed without consideration of the hardware and/or bandwidth that is required to support it. Although the recommended strategy does not eliminate this issue, it is expected to lessen the impact by limiting the opportunity for the deployment of under-engineered dual data center services. Under-engineering also often results in situations where solutions are configured to use resources inefficiently and/or out of proportion to business needs. This can lead to inflated network requirements that drive unnecessary expenditures and/or using up resources that should be used for other services that have a greater positive impact on the organization. It should be noted that under-engineering can still occur and result in significant negative impacts if done within a single data center. In respect to this, the processes inherent in this strategy intended to ensure that designs are “fit for purpose” are expected to significantly reduce its occurrence.

Systems/Storage/Backup (Data Protection) Impacts: To accomplish this recommended strategy, Systems will work with the Networking Team and the business to develop a move plan to relocate identified equipment from Ross Hall to Institute Hall and Nysernet (as appropriate). It is estimated

9

Page 10: Infrastructure services operational plan

that the migration plan schedule will be driven by the business, and may take up to several years. An accelerated schedule will be developed should the business or environmentals require it.

Benefits over current state:

● Systems o Better use of purchased resources. Manually splitting resources between 2 data centers and balancing usage

requires that thresholds and order points be set for both Data Centers. Eliminating one of the thresholds will allow ABC to utilize more of the investment, earlier in the hardware lifecycle.

● Storage o Eliminates the scenario where a server can be in one data center and the storage for that server can be in the

other Data Center. This will eliminate unnecessary utilization on the network, as well as a perception that the environment is less reliable by having it split between 2 buildings.

● Backup o Best practice designs require that Data Protection (backup) not be located in the same location as the data it is

protecting. To fully protect ABC data, a 3rd location is being utilized for Data Protection (backups). Moving to a single primary location for data will allow ABC to physically locate the Data Protection (backups) environment in Ross Hall. Ross Hall has current and future rack space, power, and cooling to allow for this.

Environmental Impact ( facilities, power management, cooling): All non core and non-tier 1 servers and services will be moved to Institute Hall.

Benefits over current state:

● Power o The Ross Hall UPS is still end of life, however the replacement will be considerably smaller and less expensive

in this scenario.

10

Page 11: Infrastructure services operational plan

● HVAC o HVAC systems would run less and thus would likely last longer deferring the cost of replacement.

Redundancy issues would be mitigated by the lower load.

Data Center Operations Impact : The major changes in Data Center Operations over the next several years will come through further automation of scheduling and batch job monitoring and management. As a result of new tools in backup and archive area(CommVault) and scheduling more capability for auto management of these jobs is possible. This will require redefinition for some of the job roles within this team. The demands will move more towards a developer/administrator vs a traditional operator role.

Costs and Sustainability: The current refresh and replacement cycles for the various technology layers are planned and managed on a rolling 5 year fiscal plan with underpinning technology roadmaps driving the capital budget process. As a frame of reference the following standards are used to refresh various components associated with the data centers and supporting services;

● Server – 3 years ● Network – 5-10 years depending on components ● Storage Arrays – 5-years ● UPS - 15-20 years ● AC units - 15-20 years ● Generators - 15-20 years

11

Page 12: Infrastructure services operational plan

Capacity Management: All infrastructure services are monitored real-time for issues including sudden capacity constraints. The environment is architected to either be self-healing in these situations or threshold monitored to allow and enginer time to add resource before customer impact. In addition to real-time monitoring there are aggregate resource management vehicle in place showing growth and forecast for critical infrastructure resource consumption(see I&O dashboard for details). These metrics are used as basis for annual capital budget planning based on;

● Past growth trends ● Current growth trajectory ● Known project on Horizon ● Excess capacity factor engineered into growth plans

See Attachment A for current capital schedule; rev. date 2/15

Critical Partnerships: The ABC facilities team which is responsible for the buildings and power coming to the centers.

Our 3 rd party vendors and carriers that provide equipment, power, access, and support for the successful operation of the centers are;

● Eaton ● ASCO ● Onan ● NYSERNet ( off site center) ● Facilities

12

Page 13: Infrastructure services operational plan

Risks to Successful Execution:

External risk: The known risks associated with this strategy are based primarily on the potential for large sponsored research efforts that require in-house

data center services. The flexible architecture deployed with this strategy will not accommodate a significant amount of growth without

major facilities investments. The lead time for any major renovations would be in the 18-24 months if facilities construction or improvement

is required.

Internal Risk: A major internal ITS risk is a lack of a disciplined design and decision governance practices. This will erode the strategy and lead us to another

accidental architecture model in future that may be unnecessarily expensive and complex. A certain degree of risk is also a result of the

newer support technologies in the Institute Hall data center. ITS has already experienced issues with the flywheel power backup

technologies, as well as control issues with the green cooling technologies, and lack of redundant generator power in Primary DC(IH).

13

Page 14: Infrastructure services operational plan

14