building(large(scale(services( - usenix...failing gracefully puppet ruby skills perl nosql...
TRANSCRIPT
Bu i l d i ng ( L a r ge ( S ca l e ( Se r v i c e s (
PRESENTED(BY(Jennifer'Davis!(November(8,(2013(
Twitter: @sigje Email: [email protected]
SysAdmin Controls all the things
11/11/13(3(
Shared Dependencies
11/11/13(4(
The Reality… (
11/11/13(5(
The Dream…
11/11/13(6(
How?
Define Core Principles
11/11/13(8(
! Common((› CollaboraGon(across(teams,(companies,(industry,(define(standards(
› Incident,(Problem,(Change,(Config,(Release(management(
! DisGnct(› Specifics(to(an(applicaGon(or(service(› Availability,(Service,(Business(ConGnuity,(Capacity((
Kill the Myths
11/11/13(9(
! Stupid(User((
Kill the Myths(
11/11/13(10(
! Stupid(User(! System(Admin(==(Operator(
(
11/11/13(11(
Failing Gracefully
puppet
ruby
SKILLS
perl
nosql
operability security
mysql
unix
TCP/IP
bash
CHEF
11/11/13(12(
Kill the Myths(
11/11/13(13(
! Stupid(User(! System(Admin(==(Operator(! Words(have(a(common(universal(implicit(meaning((
(
11/11/13(14(
Learn to Modulate your Message(
11/11/13(15(
(
(
11/11/13(16(
Team
Manager Customer
Team
11/11/13(17(
! People(working(towards(common(goal.(! Different(roles.((! Different(views.(! Same(objecGves.(
11/11/13(18(
(Image(Credit:(Kyle(LaGno(
Team
11/11/13(19(
Sugges/on:'Don’t'talk'about'the'“devs”'request,'talk'about'Elaine’s'request.''
Team
11/11/13(20(
Sugges/on:'Don’t'talk'about'the'“devs”'request,'talk'about'Elaine’s'request.''Sugges/on:'Verify'that'your'team'has'the'same'vision.'
Understand the vision.
11/11/13(21(
! Are(there(other(opGons,(open(source(or(not(within(the(company?(! Are(there(other(opGons(outside(the(company?(! Is(EVERYONE(on(the(same(page(about(what(the(service(is?(
Vision Statement
11/11/13(22(
! Clear(statement(about(the(problem(that(the(service(is(solving.(› DirecGon(
› IdenGty(management(
› Team(cohesion(
New(product?(Be(part(of(creaGng(that(vision!(
Sherpa’s Vision
11/11/13(23(
..(Distributed(replicated(eventually(consistent(key(value(store(that(had(a(focus(on(scalability(..((
My Job
11/11/13(24(
! Examine(soaware(! Define(risk(! Communicate(cost(of(risks((! MiGgate(risks(! IdenGfy(events(! Manage(events(
Fragile Platforms are Bad.
11/11/13(25(
Change is inevitable
11/11/13(26(
! Products(pivot(based(on(needs.(! Requirements(change(and(evolve.(! Know(core(issues.(
Know Core Issues
11/11/13(27(
! Limit(the(scope(of(focus.((
Know Core Issues
11/11/13(28(
! Limit(the(scope(of(focus.(! Focus(on(the(biggest(prioriGes.((
Know Core Issues
11/11/13(29(
! Limit(the(scope(of(focus.(! Focus(on(the(biggest(prioriGes.(
› Understand(Development(Methodology:(Waterfall,(Scrum,(?(
(
Know Core Issues
11/11/13(30(
! Limit(the(scope(of(focus.(! Focus(on(the(biggest(prioriGes.(
› Understand(Development(Methodology:(Waterfall,(Scrum,(?(
› IdenGfy(the(key(“Gme”(elements.(
(
Know Core Issues
11/11/13(31(
! Limit(the(scope(of(focus.(! Focus(on(the(biggest(prioriGes.(
› Understand(Development(Methodology:(Waterfall,(Scrum,(?(
› IdenGfy(the(key(“Gme”(elements.(
› Talk(to(them.(IdenGfy(their(key(terms.(“Enhancements”,(“Defects”(
(
Know Core Issues
11/11/13(32(
! Limit(the(scope(of(focus.(! Focus(on(the(biggest(prioriGes.(
› Understand(Development(Methodology:(Waterfall,(Scrum,(?(
› IdenGfy(the(key(“Gme”(elements.(
› Talk(to(them.(IdenGfy(their(key(terms.(“Enhancements”,(“Defects”(
› Establish(the(“Top”(list.((
(
Create checklists
11/11/13(33(
! Not(because(people(are(dumb.(! Not(only(because(of(automaGon.(! When(things(break,(knowing(what(needs(focus.(! During(normal(maintenance,(can(idenGfy(“not(OK”.(
› Audit(checklists(for(deployment(through(staging(environment.(
Know Outputs
11/11/13(34(
! IdenGfy(components.(! Well(defined(protocols(between(components.(! Expected(Inputs.(! Expected(Outputs.(
11/11/13(35(
11/11/13(36(
11/11/13(37(
11/11/13(38(
11/11/13(39(
Know State Transitions Explicitly.
11/11/13(40(
! When(component(is(installed(but(not(ready(
Know State Transitions Explicitly.
11/11/13(41(
! When(component(is(installed(but(not(ready(! When(the(colo(is(going(away(! Go(through(What(If(Scenarios.(
› Document(them.(
Know choke points explicitly.
11/11/13(42(
! Memory(! Disk(! Bandwidth(
Now(and(in(6(months.(JIT?(
Failure will happen.
11/11/13(43(
! There(are(no(0(failure(systems.(! (“Give(me(the(brain”(documentaGon(so(that(anyone(can(be(the(brain.(! Repeatable/Reliable(failure(handling.(! Run(fire(drills.(Really.((
11/11/13(44(
System Administration is Gardening.
11/11/13(45(
! No(guarantee(of(resources.(! Only(guarantee(is(change.(
System Administration is Gardening.
11/11/13(46(
! Nurture(relaGonships.(› Be(authenGc.(
› Be(trusGng(and(trustworthy.(
› Have(integrity.(
Success At Scale is Collaboration & Cooperation across Teams.
Decreasing Value
11/11/13(48(
11/11/13(49(
0
2
4
6
8
Jan Apr Jul Oct
# of Support Engineers
# of Support Engineers
11/11/13(50(
0
1
2
3
4
5
6
Jan Apr Jul Oct
# of Support Engineers
# of Support Engineers
11/11/13(51(
Documentation is not the cure.
11/11/13(52(
! DocumentaGon(doesn’t(guarantee(understanding.(› OperaGons(Sandbox(Environment(
! Don’t(spend(Gme(at(the(end(documenGng.(
53( 11/11/13(
Summary
Be Expendable. Feed your brain.
11/11/13(55(
Acknowledgements
11/11/13(56(
• hkp://www.flickr.com/photos/levork((• hkp://www.flickr.com/photos/puggles(• hkp://www.flickr.com/photos/byteorder(• hkp://www.flickr.com/photos/egoant(• hkp://www.flickr.com/photos/happymonkey(• Kyle(LaGno((• Greg(Connor((