visible ops

51
IS 2540 IT Governance: A Practical Transition Strategy Based on: The Visible Ops Handbook , Behr, Kim & Spafford, 2005 All figures from Visible Ops

Upload: billy82

Post on 17-Dec-2014

873 views

Category:

Documents


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Visible Ops

IS 2540

IT Governance:

A Practical Transition Strategy

Based on:

The Visible Ops Handbook, Behr, Kim & Spafford, 2005

All figures from Visible Ops

Page 2: Visible Ops

Visible Ops

• Based on extensive observation / data collection from hundreds of IT organizations

• Goal was to discover what IT practices distinguish high-performing organizations

• Employs a benchmarking approach

Page 3: Visible Ops

Visible Ops

• Main IT success factors– Pervasive change management practices– Understanding of cause-effect relationships– Use of effective & auditable controls

• IT management based on facts not intuition or gut feel

• 80% of outages due to operator or application errors

Page 4: Visible Ops

Visible Ops

• IT organizational culture problems– Bureaucratic CM – ‘End runs’ around CM– Delusional agility of cowboy culture – Control isn’t possible, so page the IT firefighters– Firefighting in lieu of fire prevention– Auditors see chaos, so they push for more controls– IT doesn’t know which controls to implement

• Does COBIT have the right controls?• In which order should they be implemented?

Page 5: Visible Ops

Visible Ops

• Symptoms of effective IT– High service levels / availability

• High MTBF and low MTTR

– Lots of changes, successfully implemented• 100s-1000s / week• > 99% successful

– Invest in early phases of IT processes• Lowers cost of defect repair• Sound familiar?

Page 6: Visible Ops

Visible Ops

– Process integration between organizational units

• Leads to collaborative working relationships

– Compliance-oriented• Relevant controls in place & working• Controls documented & easily verified

– Low % of unplanned work• < 5% spent on unplanned / urgent work• Frees up resources for fire prevention• Sound familiar?

Page 7: Visible Ops

Visible Ops

– Huge leverage WRT IT assets & human resources

• Server:SysAdmin > 100:1 (5X the average)• Process effectiveness leads to higher productivity• Sound familiar?

Page 8: Visible Ops
Page 9: Visible Ops

Visible Ops

• High performing common cultures– Change management

• Viewed as absolutely critical• Not viewed as bureaucratic• All changes must be successful

– Causality• 80% of outages due to changes• 80% MTTR finding change that caused outage• By analyzing changes, first fix 90% effective

Page 10: Visible Ops

Visible Ops

– Continual optimization• Discover root causes of IT problems• Prevent problems before they happen• Highest level of compliance with least effort to

maintain compliance• Sound familiar? (Quality is free!)

Page 11: Visible Ops

Visible Ops

• Common IT processes– None succeeded due to COBIT or ITIL!– Each rediscovered good practices – Each developed their own terminology– Causes communication problems – Visible Ops standardizes process terminology

Page 12: Visible Ops

Visible Ops

• Visible Ops standardizes processes wrt ITIL process framework

– Release – Control – Resolution – Relationship – Service delivery

Page 13: Visible Ops
Page 14: Visible Ops

Visible Ops

• Successful companies used 3 out of 5– Release

• Invest your efforts in pre-production activities• Plan, build, design, configure before release

– Control• Control to prevent service disruptions• Effective controls allow greater agility, not less

– Resolution• Minimize rework efforts & downtimes• Requires cause-effect knowledge• Frees resources for release & control activities

Page 15: Visible Ops
Page 16: Visible Ops

Visible Ops

• Other success factors– Controls are visible to management, security

& auditors– Effective CM must address human factors – Rebuilding is easier than repairing

Page 17: Visible Ops

Visible Ops

• Visible Ops Key to success– Make transition short, easy & practical

• Multi-year death marches don’t work– Could lose management sponsorship– Staff will circumvent

– Use fewest # of processes possible– Implement 3 of 4 processes within 90 days

Page 18: Visible Ops

Visible Ops

– Process projects are• Definitive w/ clearly defined objective• Ordered to build on previous phase• Catalytic to free up more resources than it uses• Auditable to create ongoing documentation of

controls• Sustaining by creation of value to enterprise

Page 19: Visible Ops

Visible Ops

• 4 Visible ops phases– Stabilize The Patient– Catch & Release and Find Fragile Artifacts– Establish Repeatable Build Library– Enable Continuous Improvement

Page 20: Visible Ops
Page 21: Visible Ops

Visible Ops

• Stabilize The Patient– Medical triage for IT– Goal

• Reduce unplanned work to < 25%• Frees resources for more productive work

Page 22: Visible Ops

Visible Ops

– Symptoms• Unplanned work 35-45% on average, can exceed

65% ..Sound familiar?• IT creates most of their own problems• Most of downtime spent diagnosing cause of

problem, only 20% spent in actual repair• Don’t know who made change or why• Changes undo other changes• Lack of confidence in IT

Page 23: Visible Ops

Visible Ops

• For each fragile IT asset– Reduce / eliminate access– No changes unless explicitly authorized– Communicate change lockdown to

stakeholders– Allow change only during specified time

window– Enforce / reinforce CM process

Page 24: Visible Ops

Visible Ops

– Effective CM plays critical role in stabilizing IT– Responsibility & accountability for everyone– Use automated detection tools like Tripwire to

ID unauthorized changes• For each unauthorized change

– Who did it?– What was changed?– Can it be rolled back?– How to prevent reoccurrence?

Page 25: Visible Ops

Visible Ops

• CM key to success is– Create culture of accountability– Enforce maintenance windows– Manage by facts, not beliefs– # of acceptable unauthorized changes = 0

Page 26: Visible Ops

Visible Ops

– Eliminating changes decreases outages reducing amounts of unplanned work

– Frees up resources for productive work– Create a Change Advisory Board (CAB) to

manage changes• Accept that business events cause IT change

events• All major IT groups on CAB + Senior Mgmnt. • Create emergency change procedure & use it

sparingly

Page 27: Visible Ops

Visible Ops

– Implement change request tracking system• Document & track all changes throughout their

lifecycle• Automated tools are available• Collect change control metrics & generate reports

– CAB weekly meetings to authorize changes• Goal is maximum effectiveness with minimum

bureaucracy• Use meeting agenda template (p. 33 - 34)

Page 28: Visible Ops

Visible Ops

– For each change request do complete analysis of impacts

• Who, What, When, How, & What IF questions • Rank requests by priority• Identify change dependencies• Major risks involved • Rollback strategy

Page 29: Visible Ops

Visible Ops

– Effective CM• Post-implementation reviews• Measure success rate & learn from it• Everyone attends meetings• Document all change outcomes

– Ineffective CM• Authorize changes without rollback plan• Rubber stamping• Outright waivers

Page 30: Visible Ops

Visible Ops

– Primary reason for any process failure is• Lack of accountability• Lack of strong management support

– General perception of nimbleness & agility is a delusion

Page 31: Visible Ops

Visible Ops

• Stabilize The Patient Benefits– Higher availability– Less firefighting– Higher change rate success– CM process that’s efficient & effective– Increased MTBF due to change windows– Decreased MTTR due to CM– Increased individual accountability– Improved organizational communication

Page 32: Visible Ops

Visible Ops

• Phase 2: Catch & Release / Find Fragile Artifacts– Create & maintain inventory of IT assets (esp.

production assets)

– Symptoms• How to start building a CMDB?• Knowledge is individual, not organizational• Uncontrolled changes cause unknown configuration states• Explosion in # of configurations

Page 33: Visible Ops

Visible Ops

– Tasks• Senior staff to inventory all managed assets• Thoroughly document all assets (P. 42 has

checklist of questions)• Tag the fragile assets

– ID those requiring most unplanned work– ‘Do Not Touch’ – Focus efforts on unstable assets

• Prevent new builds until inventory completed– Exceptions only via CAB

Page 34: Visible Ops

Visible Ops

– Benefits• Service catalog documenting most critical services

being supported • CMDB containing all CI

– Supports queries / ad hoc reporting based on metrics

• Prioritized list of projects to replace fragile assets• More organizational learning

Page 35: Visible Ops

Visible Ops

• Phase 3: Repeatable Build Library– Create library of repeatable builds focusing

first on fragile configurations– A datacenter of Golden Images– Enables replace instead of repair

Page 36: Visible Ops

Visible Ops

– Symptoms• Configurations are unique, irreplaceable works of

art• Production configurations evolve rendering release

configuration obsolete • More configurations require more specialized

knowledge about each configuration• Patches cause crashes• Patches not incorporated into builds•

Page 37: Visible Ops

Visible Ops

– Create release management team• Operate earlier in cycle to reduce defects in

production• Engineer repeatable builds

– Require constant time to rebuild– Reduces configuration variance– Junior staff does the builds– Frees senior staff for more proactive tasks

• Goal is to reduce # of configurations while increasing their shelf life

Page 38: Visible Ops
Page 39: Visible Ops

Visible Ops

– Create repeatable build process• Generates Golden Builds (master images)

– Thoroughly planned, tested and approved– Kept current with new patches & upgrades– Stored in definitive software library (DSL)

» Along with associated assets (documentation, licenses, keys, etc.)

– DSL is SW Fort Knox

Page 40: Visible Ops

Visible Ops

– Creating a DSL• ID lowest common IT asset denominators

– Operating systems, applications, business rules & data

• Create build catalog of components that must be standardized

• Create a repeatable build process for each item in catalog

• Isolate build network from other networks• Place master builds in DSL• Keep master builds current

Page 41: Visible Ops
Page 42: Visible Ops

Visible Ops

• Designate a DSL manager• Create a DSL approval process for submitting

master builds• Keep all copies under revision control• Initial 1 year amnesty for all running applications

– Replace with certified master builds as they become available

• Weed out unnecessary master builds

Page 43: Visible Ops

Visible Ops

– Establish acceptance process between production and release teams

• Release team designs and builds configurations• Production teams accepts and deploys• Production must get CAB approval prior to deployment

– Can’t put any configuration into production unless accepted by production team

• Production only tests configurations in DSL

– For security reasons, developers not part of build process

• Could insert malicious code

Page 44: Visible Ops

Visible Ops

– Patching• Belongs in release management• Patch and Pray to be avoided• Successful IT organizations patch less often• Apply / test patches before releasing to production• Patch to production system may be overwritten by

subsequent build• Use detective control tools to ensure build integrity

Page 45: Visible Ops

Visible Ops

– Benefits• Build library cuts unplanned work to < 15%• Release management team with well defined roles• Process for repeatable builds• Can repair by rebuilding• Free up senior staff resources• Tighter integration between release & production• Reduced # of configurations• Reduced patch risks

Page 46: Visible Ops

Visible Ops

• Phase 4: Continual Improvement– Goal is to collect & use metrics to improve

performance

– Simply adopting best practices = competitive parity…not good enough

Page 47: Visible Ops

Visible Ops

– Can’t manage what you can’t measure• Sound familiar?

– Key IT metrics are availability• MTBF & MTTR• Affected by factors in release & controls

Page 48: Visible Ops

Visible Ops

– Release• Are we efficient at provisioning infrastructure?

– Controls• Are we making good change management

decisions?

– Resolution• Are we quickly diagnosing and fixing problems?

– IT needs metrics for all 3 process areas

Page 49: Visible Ops

Visible Ops

– Release metrics• Time to provision good build• # of build revisions before accepted• Build shelf life• % systems that are good builds• % builds with security sign-off• # builds rushed into production• Release Engineers : SysAdmin ratio

– Higher is better

Page 50: Visible Ops

Visible Ops

– Controls metrics• # authorized changes / week• # actual change / week

– Should equal # authorized

• # unauthorized changes– Should be zero

• Change success rate– Should be > 99%

• # outages• # emergency changes per CAB

Page 51: Visible Ops

Visible Ops

• # ‘special’ changes outside CAB• # ‘business as usual’ changes• CM overhead in man-hours• Changes submitted vs. Reviewed

– Resolution metrics• MTTR• MTBF