the log - bristech

39
The Log discoverability through simplicity consistency, scalability, Roja Buck

Upload: anthony-roja-buck

Post on 13-Feb-2017

175 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Log - Bristech

The Log

discoverability through simplicity

consistency, scalability,

Roja Buck

Page 2: The Log - Bristech

Agenda

Agenda

●What is a log?●What makes logs interesting?●What can logs facilitate?

Page 3: The Log - Bristech

What is a log?

It’s not what this chap makes...

Page 4: The Log - Bristech

What is a log?

Not what devs debug with...

Page 5: The Log - Bristech

What is a log?

Definition;

append-only, time[1]-ordered sequence

[1] - time can be considered abstract and disconnected from any wall-time, in fact only relevant w.r.t. causal dependency.

Page 6: The Log - Bristech

What is a log?

Audience Participation!● Who has considered using a

log as a data-structure?

● Who has used log data-structures to solve real technical challenges?

Page 7: The Log - Bristech

What is a log?

Nothing new… Seriously

append-only, time-ordered sequence

seriously… this is what the talk is about… a base data-structure. Excited much?!

Page 8: The Log - Bristech

What is a log?

Nothing new… Seriously

Page 9: The Log - Bristech

I believe that logs are under-utilised by the vast majority of web engineers

especially when the theoretical domain offers powerful use cases for the

systems they build.

What is a log?

I haven’t seen them used...

Page 10: The Log - Bristech

What is a log?

Remember this?

● Who has used log data-structures to solve real technical challenges?

Page 11: The Log - Bristech

What is a log?

...you probably have!● Most data-stores rely on a multiple

forms of logs (WAL, log-shipping, _changes)

● Many domains of distributed systems theory solve problems involving logs (multi-paxos, raft, zookeeper)

Page 12: The Log - Bristech

What is a log?

Used any of these recently?

Page 13: The Log - Bristech

What makes them

interesting?

So...

Page 14: The Log - Bristech

What makes them interesting?

Powerful● Serialisation, a cornerstone of fault tolerant systems

● Recovery, all inputs to a system as a time-ordered sequence allows input replay and guarantees recoverability

● Availability, replay against secondary nodes promotes availability

Page 15: The Log - Bristech

What makes them interesting?

Powerful

Page 16: The Log - Bristech

What makes them interesting?

Flexible

● The state of a deterministic system built upon the concept of a log can have its state defined by a single number and the log itself

● State defined by cumulative delta allows for point-in-time interrogation e.g. At 12am yesterday how many users had never ordered a t-shirt?

Page 17: The Log - Bristech

What makes them interesting?

Distributed● A distributed log models the problem of

consensus● By combining a log with a consensus protocol

you can build up a distributed system which exhibits consistency, or knowledge of it’s lacking

● Once you can consensus within a distributed system, you can make overall progress

Page 18: The Log - Bristech

What makes them interesting?

Distributed

Page 19: The Log - Bristech

And what can they

facilitate?

Very nice...

Page 20: The Log - Bristech

What can they facilitate?

Integration Challenge● Vast untapped information within most

businesses, unfortunately inaccessible for exploitation

● Traditionally data-sharing handled by ad-hoc ETL built by the consumer. Slow, expensive and typically unreliable

Page 21: The Log - Bristech

What can they facilitate?

Integration Challenge

Page 22: The Log - Bristech

What can they facilitate?

Integration Challenge

Page 23: The Log - Bristech

What can they facilitate?

Log solution● Log-orientated architectures decoupling the

data producer and consumer passing responsibility to the producer to “publish” changes

● Due to logs being serialisable data can be consumed without blocking other systems and with no individual being capable of creating backpressure

Page 24: The Log - Bristech

What can they facilitate?

Log solution

Page 25: The Log - Bristech

What can they facilitate?

View Challenge

● Data is stored within a traditional database system in a form relevant to its use. When creating that view not all information is retained. e.g. database holding “current_state”

Page 26: The Log - Bristech

What can they facilitate?

View Challenge

Page 27: The Log - Bristech

What can they facilitate?

View Challenge

Page 28: The Log - Bristech

What can they facilitate?

Log solution● Through combining logs it is possible to build

novel views on the encapsulated data● Derivations are possible with any architecture

but log-oriented makes secondary views far more tractable

● Alternate views are also enhanced by their knowledge of what “age” there view is; automatic cache invalidation

Page 29: The Log - Bristech

What can they facilitate?

Log solution

Page 30: The Log - Bristech

What can they facilitate?

Examples;● Want a website to display the order rate? Listen

for order events published by the ordering system and aggregate

● Personalisation to take account of profitability; simply read in the finance feed and apply boosting to valuable products

Page 31: The Log - Bristech

What can they facilitate?

Examples;● Want to build metrics around individual

merchants for up-sell? Pull in the web activity and order timings feeds aggregate and produce merchant-centric documents

● Full text search across all purchases? Follow orders feed and push added line items into favourite flavour of lucene

Page 32: The Log - Bristech

What can they facilitate?

Scaling Challenge

● Ad-hoc integration model moves towards O(N2) connections between dependent system components

Page 33: The Log - Bristech

What can they facilitate?

Scaling Challenge

Page 34: The Log - Bristech

What can they facilitate?

Log Solution● The log requires only a single pipeline to the log to

write and a single pipeline to read● Scaling requires adding more consumers, or

materialises. Adding new data centres becomes largely a process of log shipping

● Whole system can be visualised as an eventually consistent database. All the materialisations are simply specialised indexes and views over the data

Page 35: The Log - Bristech

What can they facilitate?

Log Solution

Page 36: The Log - Bristech

What can they facilitate?

So why logs?● Handle data consistency by sequencing events and distributing the sequence

● Simple scalability through replicating a single data structure

● Decouple consumers trivialising integrations● Facilitates new views on data through new

materialisers● Availability is simply a matter of adding an n-th

reader

Page 37: The Log - Bristech

What can they facilitate?

Audience Participation!

● Who thinks they will take a look at building systems based on logs?

Page 38: The Log - Bristech

What can they facilitate?

Further Reading

The Log: What every software engineer should know about real-time data's unifying abstraction

https://goo.gl/eWB17o

Page 39: The Log - Bristech

Thankyou.Thoughts?

Roja Buck