managing the infrastructure of data centers - square mile
TRANSCRIPT
Managing The Infrastructure Of Data Centers
David CuthbertsonSquare Mile Systems Ltd
Square Mile Background• Develop toolsets, training and
techniques for operational management of complex IT infrastructure
• Focus areas– Data center management– Connectivity management– System change impact analysis – Documentation techniques– Infrastructure visualisation
• All technologies!
Business ProcessesDepartmental, Company
ServicesEnd user, infrastructure, supplier
ApplicationsPC, server, mainframe, SOA
Fixed Infrastructure(Cabling, Power, Cabinets, Buildings)
Hardware InfrastructureNetwork, Servers, UPS, Storage, Other
Virtual InfrastructureNetwork, Servers, Storage, DBMS
Data Center Infrastructure
Seminar overview• Understand management issues to aid
selection of technology and techniques• Communicating “best practice”• Develop your own perspectives and learn
from others• Exercises covering typical DC issues• Have a few laughs on the way• Only two uses of the word “green”
Common Management Issues• Data Centre is often not visible, nor are the staff• “If it’s not broke” attitude isn’t good,
infrastructure risks need to be managed• IT groups are often task and project orientated,
less focus on operational issues• Getting funds allocated for improving
management techniques is difficult• Skills sets need to evolve with technology and
organisation control requirements
Defining “Management”• Planning what needs to be done to
achieve a particular result• Organising and directing resources.• Controlling and making adjustments as
needed• Motivating all those involved.
Management Maturity
Reactive Repeatable Defined Managed Optimised1 2 3 4 5
Process open to external review and
updated regularly
Process checked and reviewed for
gaps
Individualapproach
Some process, often informal
Process documented
and explained
Where might you be if youa) Didn’t label patch cablesb) Labelled patch cables consistently?c) Audited records against patching documentation?
Why Improve DC Management?1. New technology demands
– Cooling, power, cabling, weight2. Save on capital and operational costs
– Optimise existing facilities– Reduce power and other costs
3. Less tolerance of outages and disruption4. Speed of change5. External need for evidence of control
Changing Requirements
BEFORE AFTERNo. of Servers per cabinet 3-6 30-40Power Disipated per cab. 300-2000W 3kW - 25kWCurrent service to cabinet 16A 2x32 A or 3 phaseTypes of Equipment Servers Blade Servers
Monitor Power Distribution UnitsKVMs MidSpan Boxes
Power Strips Disk Arrays (Storage)UPS Smart Power Strips
Regular Power StripsNetwork types 100M 1G, 10G, SANNo. of Cables Power 1 or 2 2 to 6(per server) Network 1 or 2 5 to 10
Cabinet Total 20-30 300 - 400
New Technology ChallengesSun Blade 8000 Blade Chassis
– 4 Power supplies (N+1) 9kW– 3 chassis per rack – 27kW?
HP C7000 Blade Chassis– Up to 6 Power Supplies 13kW– 4 chassis per rack
Cisco Nexus 7000 Data Center Switch– 3 Power Supplies 12kW– Up to 384 ports
And in the next few weeks?
Starting Well1. Specify and build the infrastructure using
a standards based approach1. TIA942 data centre design2. Other standards TIA, EN, etc.
2. Test installation for conformance to requirements
3. Handover of documentation, skills transfer and operational procedures to customer
So How Did This Happen?
Different Working Practices
Exercise 1 Cable ManagementBoth are cabling implementations using
different management styles
1. Explain why the cabinet on the left is preferred
2. Explain what you may have to do to keep the left cabinet maintained in the same style
Exercise 1 Review
Risk Organisation
Speed Process
Cost Technology
So…You may understand, but you can’t assume
others do
Professionally designed infrastructure will be compromised without professional management practices!
Defining Best Practices• You could define your own best practice
– Authority– Experience– Technically qualified– Best communicator– Management information
• Or you could adopt a framework– Quicker path to end result with less opinions
Management FrameworksITIL / ISO20000Service Management
BS25999 Business Continuity
ISO27001 Information Security
CoBit IT GovernancePlan
Act
Check
Do
All have a continuous process
But no equivalent for data centermanagement!
ISO20000/ITIL V2Service Delivery Processes
Security Management
Service Continuity & Availability Management
Service Level Management
Service Reporting
Capacity Management
Financial Management
Release ProcessesRelease Management
Resolution Processes
Incident Management
Problem Management
BusinessRelationship Management
Supplier Management
Control ProcessesConfiguration Management
Change Management Relationship Processes
Why have a framework?• Common understanding of complex issues
– Terms, Processes, Roles– Measurement, identification of gaps– Communication– Training for individuals and teams
• Focus provided– Easier adoption of industry techniques– Overcomes internal reluctance to change
Example of Best PracticeProcuring a new server
– Policies - sign off, payment– Ordering process – life cycle– Purchase orders – common reference– Roles and responsibilities – specify, order and
approve
Best Practice in Data Centers
• Design– TIA942 standard, Uptime Institute,
manufacturer guidelines• Build and Install
– Standards and regulations TIA, etc.• Operate
– ???– EU Code of Conduct for Data Centers –New!
Different Power Views
LINK 10/100FEATURE
LANSERIAL
CURRENT���������������
ON = I OFF = U
BLINK = REMOTE
OUTLET #I/U TOGGLE
RESERVED
STATUS 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8
100-240V
~
50~60Hz
1.2A
KVM
Servers
What should the working limit be for the power strip?
16A feed 16A feed
LINK 10/100FEATURE
LANSERIAL
CURRENT�������������
ON = I OFF = U
BLINK = REMOTE
OUTLET #I/U TOGGLE
��
Exercise 2 Power CalculationsWhat types of equipment power/current
rating should be known and used when assessing and planning changes?
So….• Monitoring tools are useful, but they only
tell you what they see• For managing power infrastructure we
may need multiple values– Manufacturer power rating– Derated power – often 60% of manufacturer– Design power– Actual power
Data Center Management• No public framework exists for data
centers – yet…• In other areas it is easier
– ITIL, ISO20000 Service Management– ISO27001 Information Security– Cobit IT Controls– Prince 2 Projects– ISO9000/6 Sigma Quality
Exercise 3- Wasted CostsGive an example where because of lack of
knowledge or understanding, unplanned or unnecessary costs were incurred
Exercise 4 Server Provisioning• You have received a request to move 10
existing servers into a data centre.
• What tasks might have to be done to fulfil this request?
Exercise 4 Server Provisioning
Assess
Plan
Implement
Test
Completion
Concentrate on these two and
the tasks appropriate to
each
It is important to identify specific tasks and whether they should be in the assessment or detailed planning phases
Provisioning ReviewWhat did we learn?
Types of information required at each stageTracking and controlCommunication needsComplexity of infrastructure changes
Managing Existing Data Centers• Environment limits• Information sets - formal and informal • Working practices - formal and informal• Roles / responsibilities• Current issues• Establish priorities
PDU
Power Strips
Power Strips
Power Strips
100A 100A 100A
Incoming feeds
UPS
3 Phase
32 & 16 Amp Sockets
Remote Controlled Power Strips
10A 16A Ethernet
Circuit Breakers
32A 16A16A
Typical Power Distribution
Exercise 5 – Power Trips• Your first objective as the new data centre
manager is to prevent circuit breakers tripping unexpectedly due to loading issues.
• What would you do and in what order?– One day– One week– One month
Establishing a Baseline• Know your environment design limits• Understand the “gaps”
– Roles– Knowledge– Practices
• Decide on priorities and actions
Baseline – troubleshooting1. Establish control2. Assess information provided3. Create plan of action4. Gather more information if required5. Stabilise6. Change process7. Optimise
Establish Design Limits• Room• Architectural and Structural - Weight• Mechanical - Cooling, fire detection
/suppression• Electrical – Power• Cabling standards and limitations
Controlling the Environment• Known design limits• A baseline of the current estate• Change approval process • Forward planning for capacity• Regular reviews against limits• Maintenance practices
– Routine– Verification on process adoption
Is this Rack Full?01-07 - FRONT
A005 A006 A007 A008A001 A002 A003 A004 A009 A010 A011 A012 A013 A014 A015 A016 A017 A018 A019 A020 A021 A022 A023 A024 PP01-07-01PWR01-07-APWR01-07-B
PROLIANT
DRIVE SURFACESMAY BE HOTALLOW TO COOLBEFORE TOUCHING
WARNING:
SVR-BHAM-010701
mic r os ys t em s®
1120
UK_BIRM_UX05
mic r os ys t em s®
1120
UK_BIRM_UX06
mic r os ys t em s®
1120
UK_BIRM_UX07
mic r os ys t em s®
1120
UK_BIRM_UX08
Cable Mgmt 01-07-04
It depends on SpaceWeightPowerCoolingConnectivity
Data Center Documentation• It’s common sense that you should know
what is in your data centre and how it is configured
• But achieving this is not so easy
• Why is it so difficult?
Data Center Documentation• Commissioning documentation
– Project plans and designs– Testing results– Initial systems provision – BMS
• Operational documentation– Various sets for ongoing management
Exercise 6 – Data Center RecordsThe 10 servers have been
successfully installed in the data center.
What records or systems would you expect to have been updated or modified directly as a result of the additional servers?
Data Center
A to Z
Different Teams, Different Focus
Fixed Infrastructure(Cabling, Power, Racks, Rooms, Buildings)
Hardware InfrastructurePCs, Network, Servers, UPS, Storage, Other
Virtual InfrastructurePCs, Network, Servers, Storage, DBMS
ApplicationsPC, server, mainframe, SOA
ServicesEnd user, infrastructure, supplier
Business ProcessesDepartmental, Company
ServiceManagement
DataCentre
NetworksLAN/SAN
Applications
Mid-range Servers
Systems
DesktopsIMAC
Different views of a server
Floor Plan
Rack Position
Service impact
Power Supply
Network Connections
BLADE_BIRM01
UK_BIR
M01_BLAD
E-01
UK_BIR
M01_BLAD
E-02
UK_BIR
M01_BLAD
E-03
UK_BIR
M01_BLAD
E-04
BLADE-BIR
M01.BLAD
E-SW1
BLADE-BIR
M01.BLAD
E-SW2
UK_BIR
M01_BLAD
E-05
UK
_BIRM
01_BLADE-09
UK
_BIRM
01_BLADE-10
UK
_BIRM
01_BLADE-12
H/W Build
Recommended Information Sets• Space • Environment (power, cooling)• Connectivity (power, networks)• Asset and Inventory controls • Device management• Service management
Space Management• Space management systems• Floor layouts and plans• Rack layouts• Floor loading• Cabinet functions (customer, comms,
servers etc.)
Environment Management• Building management systems
– Fire, cooling, temperature, power, humidity• Power management systems
– PDU, UPS, power strips• Access controls• Power distribution diagrams and lists• Current and projected power / cooling
– Hot/cold aisle views
Connectivity Management
TIA 942
Cabling architecture involving rooms, zones,
cabinets, paths
Where to Start? Structured cabling only
KVM ArchitectureLAN diagrams
Storage diagramsPatching spreadsheetsInventory list
KVM WAN diagrams Point to Point Cabling
Building wiring diagramsAsset listLegacy systems
Backbone switches
IIS ArchitecturePower distributionEdge switchesPower architecture
Blade switchesComputer room layoutPDUs
Circuit breakers Labelling standards SAN Architecture
PABX port mapping Power strip connectionsLAN Architecture
Use the Boston Matrix MethodIdentifying Focus for a Power Baseline
PDUCircuit BreakersUPSMainframeStorageFirewalls
Amount of Connections
Impa
ct o
f Pow
er O
utag
eLo
w
H
igh
Low High
Power StripsServersNetworks
ModemsManagement Tools
LightingOffice outlets
Exercise 7 – LAN ConnectivityIdentifying Focus for LAN Baseline Project
4
2
Amount of ConnectionsLow High
3
1
1. Backbone cabling2. Cabinet/Zone cabling3. Floor boxes4. Servers5. Core Switches6. Edge Switches7. Wireless Access Points8. Routers9. Firewalls10.SANs11.Power strips12.KVMs13. IP phones14.Desktops
Use
r Im
pact
of D
isco
nnec
tLo
w
H
igh
To Manage Connectivity1. Document the fixed infrastructure first
– Backbone, power, vertical2. The active components
– Switches, servers, SAN etc.3. Finally the connectivity
– Local, path and endpoints
Defining the Level of Detail1. Local patch?
PatchPanel
2. End to End path?
PatchPanel
PatchPanel
3. All devicesconnected to theswitch?
PatchPanel
PatchPanel
Asset Controls• Lists of all devices and assets• Their current status and location• Previous history and audit trail• Often combined with maintenance and
procurement data• Auto-discovery can help, but often limited
in value in data centers.
Device or Element Management• Network, server, storage monitoring• Configuration systems• Automated deployment / provisioning• Network and other architecture diagrams• Automated discovery and scanning• Backup and failover
Service & Risk Management• Help or service desk system• Project control or workflow system• Services maps
– Devices mapped to critical services• Service monitoring tools• Billing and charging• Recovery planning and testing
DC Capacity Management• Demand management to capture requests• Existing + allocated demand recorded• Capacity Plan and “database”• Reporting and trending on
– Space– Power– Cooling– Network, SAN Port availability– Resource (staff)
• “Green” reporting
Charging and FundingDifferent perspectives
SpacePowerCoolingNetwork Ports usedShared Infrastructure Costs and Support Hardware Maintenance Costs and Support Operations Costs and Support
Exercise 8 - Reducing Costs• Without buying new technology, or
changing staffing, you are targeted with improving the cost base of the data center.
• What management initiatives can you undertake that will reduce direct costs?– You have your team, they need direction
Meeting the Needs of 3rd Parties• SOX, PCI, FSA, auditors, etc.• Building & planning requirements• Employment and buildings legislation
– Disability– Health and safety– Electricity at work– Carbon tax– And others
• Insurance
What would be sufficient evidence to satisfy them of your controls in most cases?
Energy Issues• EU Regulations already in place
– Energy performance of buildings– Energy using product directive (colour codes on white goods)– WEEE and RoHS directives
• US Green Building Council LEED Program– Leadership in Environmental Design
• The Green Grid programme• EU Code of Conduct for Data Centres
– Completed 1Q2009– Covers all data center, server and equipment rooms
• UK draft climate change bill• Carbon trading
EU Code of Conduct• Aim is to inform and stimulate data center
owners to reduce energy consumption– Understand energy usage– Raise awareness– Communicate practices which will reduce
energy consumption• Voluntary at present• Available at http://dcsg.bcs.org
EU Code of Conduct• Measurements against best practices for
– Cooling– Power equipment– Other data center equipment– Data center utilisation, management & planning– IT equipment and services– Energy monitoring
• Temperature and humidity requirements for equipment– Suggested limits are 5ºC- 40ºC
EU Code of Conduct• Additional best practices document is
useful for all as it covers– Design– Operate– New equipment and retrofit issues– IT equipment selection– Power, cooling, storage– Monitoring and reporting
Managing Risk
What presents the greatest risk?
Evidence of Conformance• Policies covering control, security etc.• Evidence of processes that support the
policies– Change records– Build and test records– Written material or email trails– Communications– Incident reviews– Access lists
Access and Security Practices• Secure access is a common feature in data centres• Access control will depend on organisational issues
– Building access– Room access– Cabinet, equipment– Layering - multiple needs to authenticate access
• Many types– What you have Card or token– What you know Pass code, keypad– Who you are Biometric (finger, eye)
• Use a standard framework (ISO27001, CoBit etc.)
Security Impacts• Physical access will be more difficult
– Access requests and tracking• Changes will have to be logged at an
increased level of detail• Verification / audits will be more common
to meet auditor / customer needs• Without improving the infrastructure
documentation, changes will take longer
Current IssuesSecurity of data on individual’s financial and
personal lives is becoming high profile. The data in the data center is valuable!
www.idtheftcenter.org
ITRC20090304-01NYPD Pension Fund 3/4/2009A civilian official of the NYPD’s pension fund has been charged with stealing the identities of 80,000 current and retired cops, sources said. He allegedly got into a secret backup-data warehouse on Staten Island last month and walked out with eight tapes packed with Social Security numbers, direct-deposit information for bank accounts, and other sensitive material.
Example
Management Maturity
Reactive Repeatable Defined Managed Optimised1 2 3 4 5
Process open to external review and
updated regularly
Process checked and reviewed for
gaps
Individualapproach
Some process, often informal
Process documented
and explained
What will be different next year?
Managing the Infrastructure• Planning what needs to be done to
achieve a particular result• Organising and directing resources.• Controlling and making adjustments as
needed• Motivating all those involved
Thank you for your attention
Questions or feedback?
David CuthbertsonSquare Mile Systems Ltd
www.squaremilesystems.comwww.assetgen.com