un-broken logging - technologyug - leeds - matthew skelton

149
Un-Broken Logging the foundation of software operability TechnologUG, Leeds #techug Thursday 22 th October 2015 Matthew Skelton Skelton Thatcher Consulting @matthewpskelton

Upload: skelton-thatcher-consulting-ltd

Post on 15-Jan-2017

1.048 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Un-Broken Loggingthe foundation of software operability

TechnologUG, Leeds #techugThursday 22th October 2015

Matthew SkeltonSkelton Thatcher Consulting

@matthewpskelton

Page 2: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 3: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

The way we use logging is (often) broken

How to make our logging more awesome

Why we should care

Page 4: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 5: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 6: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 7: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Matthew Skelton

@matthewpskelton

#techug

Page 8: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 9: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 10: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 11: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 12: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

@Operability

#operability

WhoOwnsMyOperability.com

Page 13: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

confession:

I am a big fan of logging

Page 14: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 15: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

exceptional situationsedge cases

metricsanalytics‘audits’

…@evanphx

Page 16: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

execution trace

Page 17: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

BAD STUFF

Page 18: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Logging is often unloved

1. Discontinuous

2. Errors only, or arbitrary

3. ‘Bolted on’

4. No aggregation & search

5. Specify severity up front

Page 19: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

GOOD STUFF

Page 20: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

How to make logging awesome

1. Continuous event IDs

2. Transaction tracing

3. Log aggregation & search tools

4. Design for logging

5. Decoupled severity

Page 21: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

reduce time-to-detectincrease team engagement

increase configurabilityenhance DevOps collaboration

#operability

Page 22: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Background

Page 23: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Autonomous weather station

Page 24: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 25: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 26: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

MRI brain scan imaging

Page 27: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 28: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 29: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Oil well monitoring

Page 30: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 31: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 32: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Web-scale systems

Page 33: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 34: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 35: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 36: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 37: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 38: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 39: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

logging makes things work

Page 40: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

(event sourcing)

(structured logging)

(CQRS)

Page 41: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

How is logging usually broken?

Page 42: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Logging is often unloved

1. Discontinuous

2. Errors only, or arbitrary

3. ‘Bolted on’

4. No aggregation & search

5. Specify severity up front

Page 43: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 44: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

using logging mainly for errors

Page 45: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 46: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

inconsistent use of logging

Page 47: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 48: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

logging slows down the software

Page 49: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 50: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

logging ‘pollutes’ my precious domain model

Page 51: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 52: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

logging is just for those weird Ops people

Page 53: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 54: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

logging assumed to be free ($0) to implement

no budget for aggregating logs across machines

Page 55: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 56: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

log aggregation happens only in Production

logs not available to Devs

Page 57: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 58: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

fights over log severity levels

Page 59: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 60: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

poor time synchronisation

Page 61: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 62: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Some history, with pirates

Page 63: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 64: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 65: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 66: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 67: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

weather, course, sightings, latitude, longitude, …

(even when quiet)

Page 68: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

John

Har

rison

Page 69: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Why log?

Page 70: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

verificationtraceability

accountability

charting the waters

Page 71: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 72: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

- June 13th –Pirates!!!!

- Weds –Sharks!!!

- 19th Jun –BIGGER sharks!!!!

Page 73: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

How to make logging awesome

Page 74: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

How to make logging awesome

1. Continuous event IDs

2. Transaction tracing

3. Log aggregation & search tools

4. Design for logging

5. Decoupled severity

Page 75: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 76: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Storage I/O

Worker Job

Queue

Upload

Page 77: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Continuous event IDs

Page 78: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

How many distinct event types (state transitions) in

your application?

Page 79: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 80: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

represent distinct states

Page 81: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

enum

Human-readable sets: unique values, sparse, immutable

C#, Java, Python, node(Ruby, PHP, …)

Page 82: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

public enum EventID

{

// Badly-initialised logging data

NotSet = 0,

// An unrecognised event has occurred

UnexpectedError = 10000,

ApplicationStarted = 20000,

ApplicationShutdownNoticeReceived = 20001,

PageGenerationStarted = 30000,

PageGenerationCompleted = 30001,

MessageQueued = 40000,

MessagePeeked = 40001,

BasketItemAdded = 60001,

BasketItemRemoved = 60002,

CreditCardDetailsSubmitted = 70001,

// ...

}

Page 83: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Technical

Domain

public enum EventID

{

// Badly-initialised logging data

NotSet = 0,

// An unrecognised event has occurred

UnexpectedError = 10000,

ApplicationStarted = 20000,

ApplicationShutdownNoticeReceived = 20001,

PageGenerationStarted = 30000,

PageGenerationCompleted = 30001,

MessageQueued = 40000,

MessagePeeked = 40001,

BasketItemAdded = 60001,

BasketItemRemoved = 60002,

CreditCardDetailsSubmitted = 70001,

// ...

}

Page 84: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

BasketItemAdded = 60001

Page 85: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

BasketItemAdded = 60001

BasketItemRemoved = 60002

Page 86: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

BasketItemAdded = 60001

BasketItemRemoved = 60002

Page 87: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

represent distinct states

Page 88: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 89: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

OrderSvc_BasketItemAdded

Page 90: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 91: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Monolith to microservices:debugger does not have the full view

Page 92: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Even with remote debugger, it’s boring to attach and detach

Page 93: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 94: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Storage I/O

Worker Job

Queue

Upload

Page 95: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 96: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 97: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 98: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Transaction tracing

Page 99: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

‘Unique-ish’ identifier for each request

Passed through downstream layers

Page 100: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Unique-ish ID

Page 101: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

What about APM?

Page 102: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

APM gives us application insightBUT

How much do we learn? Is APM available on the Dev box?

It’s not just ‘an Ops problem’!

Page 103: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 104: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Helps us to understand how the software really works

Small overhead is worth it

Page 105: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Configurable severity levels

Page 106: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Which log level is right?

Page 107: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

DEBUG, INFO, WARNING, ERROR, CRITICAL

Page 108: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Log level should *not* be fixed at compile or build time!

Page 109: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Tune log levels

Page 110: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Tune log levels

Page 111: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Tune log levels

Page 112: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

{

"eventmappings": {

"events": {

"event": [ {

"id": "CacheServiceStarted",

"severity": { "level": "Information" }

}, {

"id": "PageCachePurged",

"severity": { "level": "Debug" },

"state": { "enabled": false }

}, {

"id": "DatabaseConnectionTimeOut",

"severity": { "level": "Error" }

} ]

}

}

}

Page 113: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 114: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Tune severity levels of specific event IDs

Page 115: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 116: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Event tracing

Use enumerations (or closest thing)

Technical and Domain event types

Distributed systems: debuggers less useful

Trace calls with ‘unique-enough’ handles

Tune log levels via config

Page 117: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Log aggregation & search tools

Page 118: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 119: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 120: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 121: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 122: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Design for log aggregation

Page 123: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

develop the software using log aggregation as a first-class thing

Page 124: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 125: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 126: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

stories for testing logging

Page 127: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

BasketItemAdded

grep BasketItem

Page 128: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

logging is (‘just’) another system component

Page 129: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

NTP

Page 130: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Dev and Ops collaboration*

* and testers too!

Page 131: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 132: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Where?

Page 133: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 134: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

auditingcompliance

pre-emptive fault diagnosisperformance

metrics…

Page 135: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Recap

Page 136: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Logging is often unloved

1. Discontinuous

2. Errors only, or arbitrary

3. ‘Bolted on’

4. No aggregation & search

5. Specify severity up front

Page 137: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

How to make logging awesome

1. Continuous event IDs

2. Transaction tracing

3. Log aggregation & search tools

4. Design for logging

5. Decoupled severity

Page 138: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

logging makes things work

Page 139: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

“There is no thought behind aspect-oriented programming”

Page 140: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

MINDFUL LOGGING (?!)

Page 141: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

database transaction logs

Page 142: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

‘Structured Logging’TW: “Adopt” (May 2015)

https://www.thoughtworks.com/radar/techniques/structured-logging

http://gregoryszorc.com/

.NET: http://serilog.net/Java: https://github.com/fluent/fluent-logger-java

Page 143: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

sanity

Page 144: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton
Page 145: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

More

Ditch the Debugger and Use Log Analysis Instead

Matthew Skelton

https://blog.logentries.com/2015/07/ditch-the-debugger-and-use-log-analysis-instead/

Page 146: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

More

Using Log Aggregation Across Dev & Ops: The Pricing Advantage

Rob Thatcher

https://blog.logentries.com/2015/08/using-log-aggregation-across-dev-ops-the-pricing-

advantage/

Page 147: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Evan Phoenix (@evanphx)

youtube.com/watch?v=Z-JskKlIBOA

Page 148: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Books

operabilitybook.comoperationalfeatures.com

Page 149: Un-broken Logging - TechnologyUG - Leeds - Matthew Skelton

Thank you

http://skeltonthatcher.com/[email protected]

@SkeltonThatcher

+44 (0)20 8242 4103

@matthewpskelton