focus, governance, and innovation: how linkedin scaled to 3m jira issues and 500m members

76
LinkedIn scales to 3M issues and 500M members A tale of process, people, and technology ARNIE MATZ | DIRECTOR, SOFTWARE ENGINEERING | LINKEDIN DAN HATA | SENIOR ENGINEERING MANAGER | LINKEDIN

Upload: atlassian

Post on 21-Jan-2018

2.534 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

LinkedIn scales to 3M issues and 500M members

A tale of process, people, and technology

ARNIE MATZ | DIRECTOR, SOF TWARE ENGINEERING | LINKEDIN

DAN HATA | SENIOR ENGINEERING MANAGER | LINKEDIN

Page 2: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 3: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

500,000,000+ registered members

138+M UNITED STATES OF AMERICA

29+M Brazil

42+M India

8+M Indonesia4+M Phillippines3+M Malaysia1+M Singapore

1+M Japan

1+M Korea

32+M China

1+M New Zealand

8+M Australia

23+M UK14+M France10+M Germany10+M Italy9+M Spain7+M Netherlands3+M Belgium2+M Denmark2+M Sweden1+M Ireland

Page 4: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 5: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 6: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 7: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Artifact Repository Review

SCM

Page 8: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Development Tools

IDE CI Pipeline

SCM Review

Jira Artifact Repository

Page 9: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Artifact Repository

Page 10: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 11: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Development Operations Marketing Facilities

Jira Business Usage at LinkedIn

Page 12: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 13: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

What did LinkedIn think of Jira in 2015?

Why does my dashboard freeze at

10am?

Jira was fast at my last company.

Why don’t we just build our own Jira?

Why does Jira always crash?

Have we looked into alternatives?

Why is Jira always slow?

Will Jira ever be stable?

Page 14: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

One last request……

Please fix Jira as soon as possible Arnie

Page 15: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

0

17.5

35

52.5

70

2015 2016 2017 2018 2019 2020 2021

Issues Count Growth: 2015-2022

Page 16: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

2015 Scary Stuff

Stability and Performance No understanding of Jira stability and performance issues.

No change control process.

Unlimited admins Many people have admin access making change control and standardization impossible.

Lucene index corruption 25% of Jira restarts resulted in index corruption and recovery takes hours.

Rapid custom field growth Contributes to index growth. There was no governance. Admins said yes to all custom field requests.

Page 17: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

No governance

No change control

No metrics

2015 ASSESSMENT SUMMARY

Unplanned outages almost every day

Sometimes all day

Thousands of custom fields

Growing by 150% year over year

....and way out of control

Page 18: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Process

Three investment areas

Technology People

Page 19: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

CHECK POINT

1.2 million issues 300+ million members

6,000+ employees

2015

People: Process:

Technology:

CRITICAL

REVIEW

CRITICAL

Page 20: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Roles For Supporting Jira

App AdminsFocus on customer service: external consultants

OperationsFocus on deterministic change and mitigating risk

DevelopersEnsuring performance and scale are built into all solutions

ManagersFocus on governance, strategy, Atlassian partnership.

Page 21: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

App Admins

Developers

Operations

Managers

2015 Team Staffing: Before

4 0

00

Page 22: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

App Admins

Developers

Operations

Managers

2015 Team Staffing: During

4

00

0

Page 23: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

App Admins

Developers

Operations

Managers

2015 Team Staffing: After

4 1

10

Page 24: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Monitoring and SLOs

Atlassian relationship

Governance CRITICAL

CRITICAL

Change control

2015 Process: Before

CRITICAL

WARNING

Page 25: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

0

17.5

35

52.5

70

2015 2016 2017 2018 2019 2020 2021

Issues Count Growth: 2015-2022

Page 26: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

0

17.5

35

52.5

70

2015 2016 2017 2018 2019 2020 2021

2015 Issue Count Projections — Original 2015 Issue Projection

— Issue Projection with Governance in Place

Page 27: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Under Control

• Removed unused Custom Fields for a 40% overall reduction!

• Limited Custom Field growth through Governance

2015: Custom Fields

Page 28: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Governance: Unbound columns

• Unbounded columns in Jira: • Comments • Versions

2015 Process Improvements

Page 29: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 30: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Misuse: What can I do?

• Document and communicate what is acceptable use

• Work with users to find the right solution • Through technology, make it impossible

for misuse to reoccur

2015 Process Improvements

Page 31: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Change Control • Configuration as code • All changes are tested,

reviewed, communicated, with rollback plans

Service Level Objectives • Tracked and investigated violations

• Example: <2 seconds issue creation time

2015 Process Improvements

Page 32: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Atlassian Relationship • Introduced TAM • Added Premier Support • Partnered with TAM and PS at Atlassian

to target a performance upgrade • Extended licensing for end of life plugin

2015 Process Improvements

Page 33: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Monitoring and SLOs

Atlassian relationship

Governance

SUCCESS

WARNING

Change control ALMOST

2015 Process: After

ALMOST

Page 34: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Availability

Hardware

Monitoring and Alerting CRITICAL

CRITICAL

WARNING

2015 Technology: Before

Page 35: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Hardware upgrade • Lucene index on SSD • Currently 75 GB • 6 hours rebuild time

2015 Technology Improvements

Page 36: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Adding Software Driven Governance • Python-Jira client enables innovations • Replica databases provides read access to

application needing real-time Jira data

2015 Technology Improvements

Page 37: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Leveraging inGraphs • All application and system resources displayed on a

single page

2015 Technology Improvements

Page 38: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 39: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 40: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 41: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 42: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 43: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Jira Data Center

2017 Technology Improvements

Page 44: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

CHECK POINT

2 million issues 400+ million members

9,000+ employees

2016

1.2 million issues 300+ million members

6,000+ employees

2015

People: Process:

Technology:

CRITICAL

REVIEW

CRITICAL

ALMOSTPeople: Process:

Technology: REVIEW

REVIEW

Page 45: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

App Admins

Developers

Operations

Managers

2016 Team Staffing: Before

4 1

10

Page 46: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

App Admins

Developers

Operations

Managers

2016 Team Staffing: After

2 1

10.5

Page 47: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Operational Excellence

Atlassian Relationship

Governance

WARNING

2016 Process: Before

ALMOST

WARNING

Page 48: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

2016 Process Improvements

Governance • Documented and communicated • All requests lead with business requirement • Scale is the most important requirement • Automated Governance

Page 49: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

2016 Process Improvements

Operational Excellence Culture • Code and config reviews • Intelligent risk decision • Change control and communication • Monitoring and metrics • Automated remediation • Service level objectives • Awesome alerts and response • Business continuity plan • Relentless pursuit of exceptions causation • Blameless postmortems

Page 50: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

2016 Process Improvements

Partnering with Atlassian • TAM relationship: evolved from tactical to

strategic in 2016 • Partnering with TAM for all major upgrades • Atlassian Premier Support provides critical bug

fix over the holidays to address bug in widely used gadget

Page 51: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Operational Excellence

Atlassian Relationship

Governance

2016 Process: After

SUCCESS

ALMOST

SUCCESS

Page 52: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Availability

Hardware

Monitoring and Alerting

CRITICAL

2016 Technology: Before

WARNING

ALMOST

Page 53: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

User blacklisting and throttling • Implemented blacklisting based on username • Throttling based on requests/minute per host

2016 Technology Improvements

#Jira.conf

# Blacklist a user to by adding and entry with value of 1.

map $remote_user $user_blacklisted {

default 0;

"johnnynumberfive" 1;

}

Page 54: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Larger Hardware, Tuned Instance • 64GB upgraded to 256GB • JVM increased

2016 Technology Improvements

Page 55: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Leveraging inGraphs • Monitor and alerting on all bottlenecks

2016 Technology Improvements

Page 56: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 57: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 58: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Logstash Parsing logs to make useful

data

Adding in ELK

Kibana Create dashboards showing

insightful data

Elastic Search Horizontally scalable data

storage

Page 59: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Adding in ELK

Page 60: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 61: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 62: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 63: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

CHECK POINT

2 million issues 400+ million members

9,000+ employees

2016

3 million issues 500+ million members

10,000+ employees

1.2 million issues 300+ million members

6,000+ employees

People: Process:

Technology:

CRITICAL

REVIEW

CRITICAL

2015

ALMOST

SUCCESS

People: Process:

Technology: ALMOST

2017

ALMOSTPeople: Process:

Technology: ALMOSTREVIEW

REVIEW

Page 64: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 65: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Operational Excellence

Atlassian Relationship

Product Vision

SUCCESS

2017 Process: Before

ALMOST

SUCCESS

Page 66: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

App Admins Operations

Developers Manager

Roles For Supporting Jira

Understands customer requirements and prioritizes work.

Product Owner

Page 67: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

App Admins Operations

Developers Manager

Roles For Supporting Jira

Understands customer requirements and prioritizes work.

Product Owner

2 1

10.5

0

Page 68: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

2017 Process ImprovementsPartnering with Atlassian • Networking with the Jira community • Providing feedback and requesting features

Page 69: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Availability

Hardware

Monitoring and Alerting

2017 Technology: Before

ALMOST

ALMOST

ALMOST

Page 70: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Real User Monitoring • Performance regression reports emailed daily • Response times include rendering • Global statistics give us insight into latency

2017 Technology Improvements

Page 71: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 72: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 73: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Jira Data Center • 4 nodes improves our MTTR by avoiding lengthy

index rebuilds • Resilient from the "single click of death"

2017 Technology Improvements

Page 74: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Getting to Scale

Always ask why!

Invest in the team

Build vendor relationship

Lather, rinse, repeat

Page 75: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members
Page 76: Focus, Governance, and Innovation: How LinkedIn Scaled to 3M Jira Issues and 500M Members

Thank you!

ARNIE MATZ | DIRECTOR, SOF TWARE ENGINEERING | LINKEDIN

DAN HATA | SENIOR ENGINEERING MANAGER | LINKEDIN