hard & soft skills to avoid outages by pascal-louis perez

Post on 10-May-2015

687 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

In just a few years, Square has achieved ubiquitous recognition for mobile card processing grossing over $15B a year in credit card transactions. At the heart of Square's technology are many financial systems which must operate safely, correctly, and sustain rapid growth. This presentation will cover non-controversial topics such ad TDD, but from new angles. It'll also cover emerging practices like continuous deployment, and softer areas such as engineering management practices geared towards safety. You'll come out of this session with a fresh perspective on how to build software.

TRANSCRIPT

Hard & Soft Skills toAvoid Outages

@pascallouis from @SquareNY

Code git rmProfit!Ship MaintainBless

Code git rmProfit!Ship MaintainBless

Code git rmProfit!Ship MaintainBless

• Fighting mixing ids

• Entity bound ids (e.g. Id<T>)

• Textual ids MWDN-YP89-OLVL-USER

• Testable configurations

• etc.

Tactics

Code git rmProfit!Ship MaintainBless

• Not controversial (anymore)

• Living code documentation

• Enables collaboration

• Technique to encode invariants

TDD

Code git rmProfit!Ship MaintainBless

• Tests which can be changed by a (small) subset of engineering

• Enforced via policy or technology

Gold Tests

Code git rmProfit!Ship MaintainBless

• “Change your language and you change your thoughts” — Karl Albrecht

• Can be implementation agnostic

Expressive Tests

Code git rmProfit!Ship MaintainBless

... Given feed PaymentEventFeedListener receives:""" { "payment_id": "EPT-300", "isTivoReplay": false, "merchant": { "token": "m-1" }, ... }""" Then expect table balance_changing_events order by id: | event_type | status | process_attempts | | HOLD | UNPROCESSED | 1 | | CAPTURE | UNPROCESSED | 0 | When then the time is 2012-01-06 17:10:00 And balance changing event queue processes items Then expect table balance_changing_events order by id: | event_type | status | process_attempts | | HOLD | UNPROCESSED | 2 | | CAPTURE | PROCESSED | 1 |

oror

Code git rmProfit!Ship MaintainBless

Quality

Time

Automated

ManualOups!

Code git rmProfit!Ship MaintainBless

• In theory: static vs dynamic

• In practice: pre vs post-production

Code Analysis

Code git rmProfit!Ship MaintainBless

• Type Checking

• Testing, CI

• Linters

• Forbidden Call Analysis

Pre Analysis

Code git rmProfit!Ship MaintainBless

• Logging

• Metrics

• Invariant Checking

Post Analysis

Code git rmProfit!Ship MaintainBless

Speaking of Alerts: Metrics vs Checks

?OK

WARNING

1

0

200ms

0ms

Code git rmProfit!Ship MaintainBless

Alert Oups!

Report Report

Precise Imprecise

Immediate

Deferred

Response

Signal

Alerting & Reporting

Code git rmProfit!Ship MaintainBless

• Time set aside, monthly or quarterly

• No top-down mandate except “fix it”

Fix It Weeks

Code git rmProfit!Ship MaintainBless

Code git rmProfit!Ship MaintainBless

Post-Mortem

• When Anytime there are issues!

• Why Learn and avoid mistakes of the past

• How Blameless

Code git rmProfit!Ship MaintainBless

Post-Mortem

• Go through the timeline

• The Good, The Bad and the Ugly

• Action Items

Code git rmProfit!Ship MaintainBless

Root Cause Analysis

Code git rmProfit!Ship MaintainBless

Code git rmProfit!Ship MaintainBless

Proportional Investing

• When you lose N hours to maintenance, you spend an equivalent N hours on improving things.

Safety drives productivity; and unleashes creativity.

Technology, sure. But, it’s mostly about culture and people.

Many layers of defense, lots of ways to do it — find what’s right for your team.

top related