Transcript
Page 1: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Hard & Soft Skills toAvoid Outages

@pascallouis from @SquareNY

Page 2: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez
Page 3: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Page 4: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Page 5: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• Fighting mixing ids

• Entity bound ids (e.g. Id<T>)

• Textual ids MWDN-YP89-OLVL-USER

• Testable configurations

• etc.

Tactics

Page 6: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• Not controversial (anymore)

• Living code documentation

• Enables collaboration

• Technique to encode invariants

TDD

Page 7: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• Tests which can be changed by a (small) subset of engineering

• Enforced via policy or technology

Gold Tests

Page 8: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• “Change your language and you change your thoughts” — Karl Albrecht

• Can be implementation agnostic

Expressive Tests

Page 9: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

... Given feed PaymentEventFeedListener receives:""" { "payment_id": "EPT-300", "isTivoReplay": false, "merchant": { "token": "m-1" }, ... }""" Then expect table balance_changing_events order by id: | event_type | status | process_attempts | | HOLD | UNPROCESSED | 1 | | CAPTURE | UNPROCESSED | 0 | When then the time is 2012-01-06 17:10:00 And balance changing event queue processes items Then expect table balance_changing_events order by id: | event_type | status | process_attempts | | HOLD | UNPROCESSED | 2 | | CAPTURE | PROCESSED | 1 |

Page 10: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

oror

Code git rmProfit!Ship MaintainBless

Quality

Time

Automated

ManualOups!

Page 11: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• In theory: static vs dynamic

• In practice: pre vs post-production

Code Analysis

Page 12: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• Type Checking

• Testing, CI

• Linters

• Forbidden Call Analysis

Pre Analysis

Page 13: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• Logging

• Metrics

• Invariant Checking

Post Analysis

Page 14: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Speaking of Alerts: Metrics vs Checks

?OK

WARNING

1

0

200ms

0ms

Page 15: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Alert Oups!

Report Report

Precise Imprecise

Immediate

Deferred

Response

Signal

Alerting & Reporting

Page 16: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

• Time set aside, monthly or quarterly

• No top-down mandate except “fix it”

Fix It Weeks

Page 17: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Page 18: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Post-Mortem

• When Anytime there are issues!

• Why Learn and avoid mistakes of the past

• How Blameless

Page 19: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Post-Mortem

• Go through the timeline

• The Good, The Bad and the Ugly

• Action Items

Page 20: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Root Cause Analysis

Page 21: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Page 22: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Code git rmProfit!Ship MaintainBless

Proportional Investing

• When you lose N hours to maintenance, you spend an equivalent N hours on improving things.

Page 23: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Safety drives productivity; and unleashes creativity.

Technology, sure. But, it’s mostly about culture and people.

Many layers of defense, lots of ways to do it — find what’s right for your team.

Page 24: Hard & Soft Skills to Avoid Outages by Pascal-Louis Perez

Top Related