@jna_sh
The LISP in the Machine
</speaker>@jna_sh
@jna_sh
</content>
<content>
@jna_sh
Braintree
</content>
<content>
@jna_sh
Thanks for having me!
</content>
<content>
@jna_sh
A story about Data at Braintree
</content>
<content>
@jna_sh
Why me?
</content>
<content>
@jna_sh
I’m not a systems engineer
</content>
<content>
@jna_sh
I don’t write Clojure
</content>
<content>
@jna_sh
I do write Haskell.
</content>
<content>
@jna_sh
…and Node.js
</content>
<content>
@jna_sh
I love cool tech.
</content>
<content>
@jna_sh
Especially FP.
</content>
<content>
@jna_sh
Braintree = Rails
</content>
<content>
@jna_sh
Go
</content>
<content>
@jna_sh
Haskell!!!1one
</content>
<content>
@jna_sh
Clojure
</content>
<content>
@jna_sh
Payments processor
</content>
<content>
@jna_sh
Uber | AirBnb | Minecraft
</content>
<content>
@jna_sh
1 million rides a day.
</content>
<content>
@jna_sh
Vast amounts of data.
</content>
<content>
@jna_sh
Powered by Clojure
</content>
<content>
@jna_sh
Building a real-time data pipeline.
</content>
<content>
@jna_sh
Once upon a time…
</content>
<content>
@jna_sh
Primary DB
@jna_sh
Data Warehouse
</content>
<content>
@jna_sh
Backup / duplication
</content>
<content>
@jna_sh
Source of truth
</content>
<content>
@jna_sh
Low impact to live transactions.
@jna_sh
Amazon Redshift
</content>
<content>
@jna_sh
Batch Processes
</content>
<content>
@jna_sh
updatedAt/createdAT
</content>
<content>
@jna_sh
Batch updates:
</content>
<content>
@jna_sh
Slow.
</content>
<content>
@jna_sh
Unpredictable.
</content>
<content>
@jna_sh
Can’t track deletes.
</content>
<content>
@jna_sh
Missed updates.
</content>
<content>
@jna_sh
Burden of Knowledge
</content>
<content>
@jna_sh
Search
@jna_sh
Architecture
Primary DB Batch process Redshift data warehouse
Transaction search
</content>
<content>
@jna_sh
PGQ
@jna_sh
Queuing system on top of Postgres
</content>
<content>
@jna_sh
ACID
</content>
<content>
@jna_sh
Doesn’t block live transactions.
</content>
<content>
@jna_sh
Elasticsearch
</content>
<content>
@jna_sh
Another place to sync data.
@jna_sh
Architecture
Primary DB PGQ Redshift data warehouse
Elasticsearch
</content>
<content>
@jna_sh
PGQ prioritises DB integrity
</content>
<content>
@jna_sh
Potential for lost messages.
</content>
<content>
@jna_sh
Redshift & Elasticsearch…
</content>
<content>
@jna_sh
…Fall over quite often.
</content>
<content>
@jna_sh
Where do we persist our messages?
</content>
<content>
@jna_sh
Enter Kafka.
</content>
<content>
@jna_sh
Apache | LinkedIn
</content>
<content>
@jna_sh
PubSub messaging system
</content>
<content>
@jna_sh
Cluster of Kafka nodes
</content>
<content>
@jna_sh
Multi-produce multi-consume
</content>
<content>
@jna_sh
Zookeeper - Failure states
</content>
<content>
@jna_sh
Topics
</content>
<content>
@jna_sh
Categories of Messages
</content>
<content>
@jna_sh
Partitions
</content>
<content>
@jna_sh
Split by Machine
</content>
<content>
@jna_sh
No rules of topic writing.
</content>
<content>
@jna_sh
Messages given an offset.
</content>
<content>
@jna_sh
Deleted after time set by user.
</content>
<content>
@jna_sh
Kafka properties:
</content>
<content>
@jna_sh
Replays
</content>
<content>
@jna_sh
Strongly ordered
</content>
<content>
@jna_sh
But only by partition.
@jna_sh
Architecture
Primary DB PGQ Kafka Redshift data
warehouse
Elasticsearch
</content>
<content>
@jna_sh
Gateway
</content>
<content>
@jna_sh
Databases sharded by Merchant
</content>
<content>
@jna_sh
Partition fed by shard
</content>
<content>
@jna_sh
Strong ordering per merchant
</content>
<content>
@jna_sh
Categorise by semantics of data
</content>
<content>
@jna_sh
Redshift needs shape of data
</content>
<content>
@jna_sh
Elasticsearch needs meaning
</content>
<content>
@jna_sh
topics = datastream | eventstream
</content>
<content>
@jna_sh
Cool! Job done.
</content>
<content>
@jna_sh
Time to build it.
</content>
<content>
@jna_sh
Clojure.
</content>
<content>
@jna_sh
Why Clojure?
</content>
<content>
@jna_sh
Rails
</content>
<content>
@jna_sh
No Lisp
</content>
<content>
@jna_sh
No JVM
</content>
<content>
@jna_sh
Because reasons
</content>
<content>
@jna_sh
JVM
</content>
<content>
@jna_sh
Kafka, Zookeeper, Elasticsearch are JVM
</content>
<content>
@jna_sh
Laziness
</content>
<content>
@jna_sh
Infinite Lazy Streams
</content>
<content>
@jna_sh
Testing
</content>
<content>
@jna_sh
Concurrency
</content>
<content>
@jna_sh
Threads
</content>
<content>
@jna_sh
Goroutines
</content>
<content>
@jna_sh
Actors
</content>
<content>
@jna_sh
Built-in shutdown logic
</content>
<content>
@jna_sh
Status of actors
</content>
<content>
@jna_sh
Single merchant
</content>
<content>
@jna_sh
Offload work to Kafka
</content>
<content>
@jna_sh
Elasticsearch aliases
</content>
<content>
@jna_sh
What did we learn?
</content>
<content>
@jna_sh
Garbage Collection
</content>
<content>
@jna_sh
Boo hiss
</content>
<content>
@jna_sh
Keep it small
</content>
<content>
@jna_sh
G1GC
</content>
<content>
@jna_sh
Heap size is important
</content>
<content>
@jna_sh
Smaller = Better
</content>
<content>
@jna_sh
Monitor all the things
</content>
<content>
@jna_sh
Don’t use deault configs
</content>
<content>
@jna_sh
Use a good concurrency model
</content>
<content>
@jna_sh
Future gains
</content>
<content>
@jna_sh
Real time source of truth
</content>
<content>
@jna_sh
Real time fraud monitoring
</content>
<content>
@jna_sh
Real time reports
</content>
<content>
@jna_sh
Thank you!
</speaker>@jna_sh
@jna_sh
@jna_sh
Thanks_ Subtitle