rebooting the team - surge 2013

75
Lessons Learned in an Introspective Year Rebooting the Team Fran Fabrizio IT Director, Minnesota Population Center Twitter: @franfabrizio Email: [email protected]

Upload: fran-fabrizio

Post on 26-Jun-2015

167 views

Category:

Technology


0 download

DESCRIPTION

A year ago, our software development team ended up in a funk. Simply put, we had some bugs in our processes, relationships and environment that were preventing us from being the best team we could be. So we did what any good dev team does when it encounters bugs: we deconstructed the problems, determined the root causes and implemented some fixes. I’ll share our story and discuss the lessons we learned along the way. You’ll take away ideas and tools that can help you explore these critical, but often tricky, topics in order to prepare your team to really scale.

TRANSCRIPT

Page 1: Rebooting the Team - Surge 2013

Lessons Learned in an Introspective Year

Rebooting the Team

Fran FabrizioIT Director, Minnesota Population Center

Twitter: @franfabrizio Email: [email protected]

Page 2: Rebooting the Team - Surge 2013

Motivation for Rebooting the Team

DISCLAIMER

This is not a talk specific to scalability.

So, why am I here?

It goes like this...

2

Page 3: Rebooting the Team - Surge 2013

3

Page 4: Rebooting the Team - Surge 2013

4

Page 5: Rebooting the Team - Surge 2013

5

Page 6: Rebooting the Team - Surge 2013

6

Page 7: Rebooting the Team - Surge 2013

7

Page 8: Rebooting the Team - Surge 2013

8

Page 9: Rebooting the Team - Surge 2013

9

Page 10: Rebooting the Team - Surge 2013

10

Page 11: Rebooting the Team - Surge 2013

11

Page 12: Rebooting the Team - Surge 2013

Motivation for Rebooting the TeamSCALABILITY EDITION!

The Surge organizers contacted me and said “We came across your talk about rebooting your team. We’re filling out our organizational scalability track - would your talk work well there?”

This was an interesting question. I embarked upon a “pondering walk”...

12

Page 13: Rebooting the Team - Surge 2013

13

Page 14: Rebooting the Team - Surge 2013

14

Page 15: Rebooting the Team - Surge 2013

15

Page 16: Rebooting the Team - Surge 2013

16

Page 17: Rebooting the Team - Surge 2013

In the next 50 minutes we’ll review our case study, including:

• Why we need reboots• Gut feelings --> specific symptoms• The collaborative process used to get to

the root causes• Leveraging the insight to build consensus

for change and action• Where we are and what we learned• Q&A

17

Page 18: Rebooting the Team - Surge 2013

Audience TakeawayTechniques, tools, activities, processes, ideas, sparks.... stuff you can use to help figure out what’s ailing your team, whether your team needs a reboot, and how to collaboratively enact lasting change.

Not going to be a silver bullet - I’m giving you design patterns. You will have homework!

18

Page 19: Rebooting the Team - Surge 2013

What’s a Reboot?

Page 20: Rebooting the Team - Surge 2013

What’s a reboot?

A conscious decision to engage in deeper, more radical change than just incremental improvements.

A reboot typically impacts staff structure, work processes and communication patterns for your team and organization.

20

Page 21: Rebooting the Team - Surge 2013

Why Do We Need Reboots?

21

DEV TEAM POWER METER100% 0%

Team firing on all cylinders, shipping great code, everyone contributing, team

greater than sum of its parts!

Page 22: Rebooting the Team - Surge 2013

Why Do We Need Reboots?

22

DEV TEAM POWER METER100% 0%

Enlightened organizations keep the needle at 100% by proactivelyanticipating the changes in theirenvironment and responding to

them gracefully over time.

Page 23: Rebooting the Team - Surge 2013

Why Do We Need Reboots?

23

DEV TEAM POWER METER100% 0%

Most organizations are not that enlightened.The needle begins to slide as the org is slow to respond to changes in their environment.

Page 24: Rebooting the Team - Surge 2013

Why Do We Need Reboots?

24

DEV TEAM POWER METER100% 0%

By the time there is awareness and consensusfor change, the amount of change needed is often too great to achieve incrementally.

Page 25: Rebooting the Team - Surge 2013

Wetware Reboots are Hard!• Wetware is wonderfully, messily analog,

nondeterministic, and mysterious - it doesn’t respond predictably to change.

• Reboots are costly and disruptive - we wouldn’t do them if we didn’t have to

• An engineer’s wetware skillset is typically less developed than their software/hardware skills

25

Page 26: Rebooting the Team - Surge 2013

The Dev Manager’s Mission Something feels wrong. The team’s not working as well as it used to. You’re not quite sure what it is yet, or more importantly why it’s happening, or how to fix it. But you’re the one everyone’s looking to for a fix.

How do you get to the whats, whys, and hows? You need...

26

Page 27: Rebooting the Team - Surge 2013

Organizational DebuggingA framework for turning observed behaviors and stakeholder input into a clear understanding of where your team has deficiencies and how to address them.

1.Observe behavior and get stakeholder views2.Distill into themes3.Dig until you converge on root causes 4.Execute action plans for each root cause

27

Page 28: Rebooting the Team - Surge 2013

Do we need a reboot?

Symptoms Smells Root Causes Actions

Concrete

Abstract

Diagnosis Treatment

DecisionPoint

Themes

Page 29: Rebooting the Team - Surge 2013

How is this Different than Normal Management?

29

It’s amplified.

Page 30: Rebooting the Team - Surge 2013

How to Approach a Reboot• Respect the day job • Change is scary. Be consistent,

overcommunicate, and check in often• You are a facilitator: listen more than talk• HRT - Humility, Respect and Trust - is

essential. (from the book Team Geek)

30

Page 31: Rebooting the Team - Surge 2013

Why HRT is so Important

• When organizational debugging is done collaboratively and with HRT, it produces momentum for change.

• When it’s not, it produces resentment and friction, and gets in the way of the organization executing needed change.

31

Page 32: Rebooting the Team - Surge 2013

Observing the Symptoms

Page 33: Rebooting the Team - Surge 2013

Observing the Symptoms

• Recall this typically starts as a gut feeling that “something’s not right”

• Start by writing down a list of symptoms that are giving you this gut feeling.

• Then put on your facilitator hat and start asking others (inside and outside your team) targeted but open-ended questions

33

Page 34: Rebooting the Team - Surge 2013

Good Questions to Ask

• How did your week go?• What are your biggest pain points right

now? • What do you think are the most important

things for us to be doing / thinking about?• Do you need anything from me?• How are things with <insert customer>?

34

Page 35: Rebooting the Team - Surge 2013

Listen MindfullyAs you’re having these conversations, listen for symptoms...

“Well, I spent the first half of the week setting up my dev environment on my new system, and then I had to put out a lot of fires on that one project. By the time I had any breathing room it was Friday and I couldn’t get any time with Mark, so I didn’t do any new feature development on my main project. Have you heard from the product team? I was wondering whether they want that new UI widget now, or if they want us to work on optimizing the query performance first?...” 35

Page 36: Rebooting the Team - Surge 2013

Avoid

• What do you think is wrong with the team?– Sets off alarm bells and people feel compelled

to answer even if they weren’t thinking it • Try to avoid diving into solutions just yet

– Suggestions fine, but there’s risk of treating symptoms and not the root cause at this stage.

• Don’t go too deep–First pass over your stakeholders, just getting

a feel for things right now. Should feel casual.36

Page 37: Rebooting the Team - Surge 2013

You’ve been committed to things without your knowledge

Ship dates slip

Expected to do 5 things at once

No 1:1 meetings

Small changes take longer than they should

Setting up a dev environment takes 3 days

No one person knows how to do a full deployment

Every deployment results in a big mess

People miss key pieces of info

Sick time is spiking

You get a pit in your stomach when you walk in the door

New requirements appear late in projects

Staff working on similar problems not collaborating

I can’t move code between projects easily.

Staff members reluctant to share knowledge

Documentation is out of date

Status meetings turn into bitchfests

People lament the quality of their work

People are quitting!

Can’t say no to anything

Customers don’t trust what IT says.

Estimates are unreliable

Projects getting later and later

Secondary projects fall through the cracks

Build is always broken!

You’re on a death march, and everyone knows it.

People are struggling with their tools.Every decision is made by committee

No consensus on priorities

Nobody knows what ‘done’ meansTeam focusing on peripheral issues

37

Job descriptions no longer match reality

Surprises are commonUptime is decreasing

Recruitment is getting more difficult

Long term goals aren’t getting closer

Exploratory work has stopped

I cannot determine the status of our systems.

I’m doing all the same things I was a year ago.

Vacations are disruptive

It’s too quiet!

Page 38: Rebooting the Team - Surge 2013

Grouping Symptoms

• Go back to your symptom list and see if they group into related themes.

• Themes are “anchors” for discussion - rather than focusing on specific symptoms, which can get bogged down in the weeds

38

Page 39: Rebooting the Team - Surge 2013

You’ve been committed to things without your knowledge

Ship dates slip

Expected to do 5 things at once

No 1:1 meetings

Small changes take longer than they should

Setting up a dev environment takes 3 days

No one person knows how to do a full deployment

Every deployment results in a big mess

People miss key pieces of info

Sick time is spiking

You get a pit in your stomach when you walk in the door

New requirements appear late in projects

Staff working on similar problems not collaborating

I can’t move code between projects easily.

Staff members reluctant to share knowledge

Documentation is out of date

Status meetings turn into bitchfests

People lament the quality of their work

People are quitting!

Can’t say no to anything

Customers don’t trust what IT says.

Estimates are unreliable

Projects getting later and later

Secondary projects fall through the cracks

Build is always broken!

You’re on a death march, and everyone knows it.

People are struggling with their tools.Every decision is made by committee

No consensus on priorities

Nobody knows what ‘done’ meansTeam focusing on peripheral issues

39

Job descriptions no longer match reality

Surprises are commonUptime is decreasing

Recruitment is getting more difficult

Long term goals aren’t getting closer

Exploratory work has stopped

I cannot determine the status of our systems.

I’m doing all the same things I was a year ago.

Vacations are disruptive

It’s too quiet!

Teams are too silo’ed

Page 40: Rebooting the Team - Surge 2013

You’ve been committed to things without your knowledge

Ship dates slip

Expected to do 5 things at once

No 1:1 meetings

Small changes take longer than they should

Setting up a dev environment takes 3 days

No one person knows how to do a full deployment

Every deployment results in a big mess

People miss key pieces of info

Sick time is spiking

You get a pit in your stomach when you walk in the door

New requirements appear late in projects

Staff working on similar problems not collaborating

I can’t move code between projects easily.

Staff members reluctant to share knowledge

Documentation is out of date

Status meetings turn into bitchfests

People lament the quality of their work

People are quitting!

Can’t say no to anything

Customers don’t trust what IT says.

Estimates are unreliable

Projects getting later and later

Secondary projects fall through the cracks

Build is always broken!

You’re on a death march, and everyone knows it.

People are struggling with their tools.Every decision is made by committee

No consensus on priorities

Nobody knows what ‘done’ meansTeam focusing on peripheral issues

40

Job descriptions no longer match reality

Surprises are commonUptime is decreasing

Recruitment is getting more difficult

Long term goals aren’t getting closer

Exploratory work has stopped

I cannot determine the status of our systems.

I’m doing all the same things I was a year ago.

Vacations are disruptive

It’s too quiet!

Ops Tools Deficient

Page 41: Rebooting the Team - Surge 2013

You’ve been committed to things without your knowledge

Ship dates slip

Expected to do 5 things at once

No 1:1 meetings

Small changes take longer than they should

Setting up a dev environment takes 3 days

No one person knows how to do a full deployment

Every deployment results in a big mess

People miss key pieces of info

Sick time is spiking

You get a pit in your stomach when you walk in the door

New requirements appear late in projects

Staff working on similar problems not collaborating

I can’t move code between projects easily.

Staff members reluctant to share knowledge

Documentation is out of date

Status meetings turn into bitchfests

People lament the quality of their work

People are quitting!

Can’t say no to anything

Customers don’t trust what IT says.

Estimates are unreliable

Projects getting later and later

Secondary projects fall through the cracks

Build is always broken!

You’re on a death march, and everyone knows it.

People are struggling with their tools.Every decision is made by committee

No consensus on priorities

Nobody knows what ‘done’ meansTeam focusing on peripheral issues

41

Job descriptions no longer match reality

Surprises are commonUptime is decreasing

Recruitment is getting more difficult

Long term goals aren’t getting closer

Exploratory work has stopped

I cannot determine the status of our systems.

I’m doing all the same things I was a year ago.

Vacations are disruptive

It’s too quiet!

Overpromising / Underdelivering

Page 42: Rebooting the Team - Surge 2013

Themes that We FoundUnfulfilled team members

Bad office vibe

Routine things are complicated

Something’s always on fire

Lack of trust in the developers

Every day is a surprise party

Ops Tools Deficient

Single points of failure aboundOverpromising/Underdelivering

Leadership is distractedThe team is too silo’ed

42

Page 43: Rebooting the Team - Surge 2013

Getting to Root Causes and Solutions

Page 44: Rebooting the Team - Surge 2013

Rooting out Root Causes

Second, much deeper pass with stakeholders.

Goals:1. Build awareness that these problems exist2. Achieve consensus re: root causes3. Create momentum for change

44

Page 45: Rebooting the Team - Surge 2013

TheEstimates

Story

45

Page 46: Rebooting the Team - Surge 2013

Peeling off the Layers• “IT isn’t good at estimates” really meant

“We need to work towards more transparency and communication with our customers and management.”   

• Deeper problem: communication disconnect between dev, management, and our customers. We need to treat this problem, not the “IT isn’t good at estimates” symptom.

46

Page 47: Rebooting the Team - Surge 2013

The (Not So) Big Secret!When you dig into a wetware problem, you’re always going to find at least one of these things:

TRUST

But “We suck at communicating” is not actionable.  

Get to more specific root causes.

COMMUNICATION PROCESS

47

Page 48: Rebooting the Team - Surge 2013

Engaging the Stakeholders

To get the insight you need, people must be engaged in a way that’s meaningful to them.

Tailor approach, content and detail.

Understand their motivations, whatthey can uniquely contribute, and

meet them there!

48

Page 49: Rebooting the Team - Surge 2013

Engaging the Dev Team• Their motivations:

– Autonomy, Mastery and Purpose– Daniel Pink’s 2009 TED talk

www.danpink.com/ac/ted-talk/– Build awesome stuff with awesome people

• What they can contribute:–They do root cause analysis all the time–Evidence-based view of the world–And of course, deep understanding of tech bits

49

Page 50: Rebooting the Team - Surge 2013

Engaging the Dev Team• Your Goals:

–Give the team as much ownership of the reboot as possible• Only fair, this is where the brunt of reboot change will land

–Give visibility to long term strategy and external dependencies on the team• “State of the Outside World”

50

Page 51: Rebooting the Team - Surge 2013

Dev Team Engagement Tactics• Create dedicated time/space to foster group

dialogue about the themes and root causes– Focus Friday– Forums/chat rooms

• “Squishy” book discussion. Example: Team Geek

• Meet privately with individuals regularly (should be doing thisalready).

51

Page 52: Rebooting the Team - Surge 2013

Engage Management• Their motivations: Exploiting market

opportunities, mitigating risk, efficient use of resources - “macro things”

• What they can contribute:–org strategy–deep understanding of what other areas of the

org are doing–better awareness of external opportunities and

pressures52

Page 53: Rebooting the Team - Surge 2013

Engage Management• Your goals: Provide visibility to issues,

incorporate their big picture perspectives, obtain support for change

• Example Activities:– Roundtables - ask the “macro” questions– Vision & perception interview with key people– Use concise, powerful tools, such as...

53

Page 54: Rebooting the Team - Surge 2013

INFRASTRUCTURE+

PROCESS+

FEATURE+DEV+

Q1+ Q2+ Q3+ Q4+

2013+MPC+IT+MAJOR+PROJECT+ROADMAP+

NHGIS+

Rails+2+>>+3+Conversions+

Ruby+1.8+>>+1.9+

ATUS+>>+IPUMS+IntegraMon+

TerraPop+Prototype+

IPUMS+ TerraPop+ Miscellaneous+MulMple+Projects+

Minimal+AggregaMon+Nominal+IntegraMon+ InterpolaMon+

Solaris+>>+Linux+

Storage+Upgrade+

Bibliography+Rewrite+

Admin+Core+Database+

Archive+Upgrade+

Subversion+>>+Git+Conversions+

TeamCity+>>+Jenkins+Conversions+

Separate+Webapp+from+Extract+

IDHS+Metadata+Tools+

Time+Series+Support+

TerraPop+ProducMon+

IDHS+Webapp+Development+

User+AuthenMcaMon+&+AuthorizaMon+Service+

MDT+Metadata+Tool+Development+

General+IPUMS+Feature+Development+

StaMc+>>+Drupal+MigraMons+

Column>Store+Database+MigraMon+

SDA+Extensions+&+ModificaMons+

NHGIS+Data+Ingest+

v1.0+[2/19/2013]+

1960+

StaMc+Asset+Management+

Systems+Monitoring+and+ReporMng+

Hadoop+&+Big+Data+Technologies+R&D+

54

Management Engagement Tactics

Page 55: Rebooting the Team - Surge 2013

55

Management Engagement Tactics

Develop a Product Vision Picture

Page 56: Rebooting the Team - Surge 2013

Engaging the Customer

• Their Motivations:–Want solutions which solve their problems–Want to have transparency into the process

• What they can contribute:–User centric design and prioritization–More honest feedback - they don’t have as

many relationships or as much baggage to protect

56

Page 57: Rebooting the Team - Surge 2013

Engaging the Customer

• Your Goals:–Bring the customer closer into the team–Get them to help your org prioritize–Make them happy :-)

57

Page 58: Rebooting the Team - Surge 2013

Customer Engagement Tactics

• Roundtables with our customer teams• Retrospectives after each major release• Example tool:  Mapping the

communication flow

58

Page 59: Rebooting the Team - Surge 2013

Example of a Communications Map

59

Page 60: Rebooting the Team - Surge 2013

Converging on Root Causes

Dev Team Insight: “We get a lot of requests that pull us away from our core projects.”

Management Insight: “We want to help campus folks who are doing demographic research, but we don’t have a process for vetting or prioritizing requests.”

Customer Insight: “The MPC agreed to do my research web site, but when I call over there it seems that there’s always something more pressing.”

Suggests root causes might be Lack of Focus on Mission

60

Why do we think “Something’s Always on Fire?”

Page 61: Rebooting the Team - Surge 2013

Some of our Root Causes• Lack of Focus on Core Mission

• Team Poorly Aligned with Strategy

• Operational Deficits

• Lack of Customer-Developer Transparency

61

Page 62: Rebooting the Team - Surge 2013

Getting to Solutions

• For each root cause, collaboratively design an action plan that mixes easy wins with longer-term fixes

Quick sampling of our reboot action plans...

62

Page 63: Rebooting the Team - Surge 2013

Solutions: Lack of Focus on Mission

• Early Wins:– Defined the mission!– Made all hidden work visible to management– Outsourced or killed fringe projects and other

distractions and aligned effort to the core mission

• Long-term Work–Align projects to org vision, not vice-versa–Team structure realignment

63

Page 64: Rebooting the Team - Surge 2013

Solutions: Poor Alignment with Strategy

• Early Wins:– Product Vision document– Some early hires not tied to specific projects

• Long-term Work– Conway’s Law and its application

64

Page 65: Rebooting the Team - Surge 2013

Conway’s Law

“Organiza)ons  which  design  systems  are  constrained  to  produce  designs  which  are  copies  of  the  communica)on  

structures  of  these  organiza)ons.”

Melvin  Conway1968

65

Page 66: Rebooting the Team - Surge 2013

66

Align to the thing you want to build...

Page 67: Rebooting the Team - Surge 2013

Solutions: Operational Deficits

• Early Wins:–Evolve tools: SVN → Git, TeamCity → Jenkins,

Chef, Vagrant–Management support for 20% tech debt time–HipChat

• Long-term Work:–Ongoing technical debt reduction–System Monitoring API

67

Page 68: Rebooting the Team - Surge 2013

Solutions: Lack of Customer-Developer Transparency

• Early Wins:–Formalize a quarterly planning process–Open work tracking tool to customers

• Long-term Work:–Refactoring the communications model

68

Page 69: Rebooting the Team - Surge 2013

Messier, but more effective!69

New Comm Model

Page 70: Rebooting the Team - Surge 2013

Measuring Outcomes

• Probably the most important one is qualitative: “Hey, things feel better now!”  

• Quantitative indications will eventually come. Focus on measures that have meaning to your org’s culture (KPIs?)

• Share metrics and results widely to keep momentum going

70

Page 71: Rebooting the Team - Surge 2013

Unintended Consequences

Be prepared to iterate! 71

Page 72: Rebooting the Team - Surge 2013

Where are we now?

• We feel good about...– Management support for the development team– Long term vision and goals– Awareness of the need to prioritize/focus efforts– Communication patterns

• We are still working on...– Acting holistically within a project-specific funding model– Aligning the staff to best support the desired product– More effective use of ops tools to automate routine work

72

Page 73: Rebooting the Team - Surge 2013

Lessons Learned• Know who you are as an organization

– History, context, strengths, weaknesses, constraints, strategy• An Org-first AND People-first approach is possible• People are messy

– HRT underlies successful org change– Assume that everyone has good intentions– Don’t hide behind technology - wetware issues require face

time• Change is scary. Expect resistance. This is a process of

influence - you can’t make others change directly.• Don’t fear going against the grain - do what’s right for

your org, don’t be a slave to any particular process, system, methodology

73

Page 74: Rebooting the Team - Surge 2013

Lessons Learned

Avoid reboots if at all possible!

74

Incorporate your newfound tools and techniques into your daily workflow to pay more attention to how change is

impacting your team, engage your colleagues, and respond to it proactively before you need another reboot!

They’re expensive, disruptive and tricky to execute.

Instead...

Page 75: Rebooting the Team - Surge 2013

Thank You!Continue the conversation...

Twitter: @franfabrizioEmail: [email protected]

Special thanks to Peter Clark (@pclark) for his contributions to an earlier version of this presentation.

75