cqrs and event sourcing with mongodb and php
TRANSCRIPT
CQRS and Event Sourcingwith MongoDB and PHP
About me
Davide Bellettini● Developer at Onebip● TDD addicted
@SbiellONE — about.bellettini.me
What is this talk about
A little bit of context
About OnebipMobile payment platform.Start-up born in 2005, acquired by Neomobile group in 2011.
Onebip today:- 70 countries- 200+ carriers- 5 billions potential users
LAMP stack
It all started with a Monolith
self-contained services communicating via REST
To a distributed system
First class modern NoSQL distributed dbs
Modern services
But the Monolith is still there
The problem
A reporting horror story
We need three new reports!
― Manager
Sure, no problem!
Deal with the legacy SQL schema
Deal with MongoDB
A little bit of queries here,a little bit of map-reduce there
1 month later...
Reports are finally ready!
until...
Your queries are killing production!
― SysAdmin
Still not enough!
Heavy query optimization,adding indexes
Let’s reuse data from other reports(don’t do that)
DB is ok, reports delivered.
but then...
Houston, we have a problem. Reports are not consistent (with other reports)
― Business guy
Mistakesweremade
Lessonslearned
It’s hard to compare different data in a distributed system splitted across multiple domains
#1Avoid multiple sources of truth
Same words, different concepts across domains
#2Ubiquitous language
Changing a report shouldn’t have side effects
#3Fault tolerance to change
Most common solutions
#1ETL + Map-Reduce
#2
Data Warehouse + Consultants
#3Mad science (Yeppa!)
What we wanted
No downtime in production
Consistent across domains
Must have
A system elastic enough to extract any metric
Real time data
Nice to have
In DDD we found the light
CQRS and Event Sourcing
Command-query responsibility segregation
(CQRS)
Commands
Anything that happens in one of your domains is triggered by a command and generates one or more events.
Order received -> payment sent -> Items queued
-> Confirmation email sent
Query
Generate read models from events depending how data need to be actually used (by users and other application internals)
Event SourcingThe fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied.
― Martin Fowler
Starting from the beginning of time, you are literally unrolling history to reach state in a
given time
Unrolling a stream of events
Idea #1
Every change to the state of your application is captured in event object.
“UserLoggedIn”, “PaymentSent”, “UserLanded”
Idea #2
Events are stored in the sequence they were applied inside an event store
Idea #3
Everything is an event. No more state.
Idea #4
One way to store data/events but potentially infinite ways to read them.
A practical exampleTech ops, business control, monitoring, accounting they all are interested in reading data from different views.
Healthy NoSQL
You start with this{ "_id": ObjectId("123"), "username": "Flash", "city": …, "phone": …, "email": …,}
The more successful your company is, the more people
…
The more people, the more views
With documental dbs it's magically easy to add new fields to your collections.
Soon you might end up with{
"_id": ObjectId("123"),
"username": "Flash",
"city": …,
"phone": …,
"email": …,
"created_at": …,
"updated_at": …,
"ever_tried_to_purchase_something": …,
"canceled_at": …,
"acquisition_channel": …,
"terminated_at": …,
"latest_purchase_date": …,
…
}
A bomb waiting to detonate
It’s impossible to keep adding state changes to your documents and then expect to be able to extract them with
a single query.
Exploring Tools
Event Store
● Engineered for event sourcing● Supports projections● By the father of CQRS (Greg Young)● Great performanceshttp://geteventstore.com/
The badBased on Mono, still too unstable.
LevelWHEN
An event store built with Node.js and LevelDB● Faster than light● Completely custom, no tools to handle
aggregates
https://github.com/gabrielelana/levelWHEN
The known path
● PHP (any other language would just do fine)
● MongoDB 2.2.x
Why MongoDB
Events are not relational
Scales well
Awesome aggregation framework
Hands on
Storing Events
Service |
\ |
\ [event payload] |
\ |
Service --- Queue System <------------> API -> MongoDB
/ |
/ [event payload] |
/ |
Service |
The write architecture
Queues
Recruiter - https://github.com/gabrielelana/recruiter
MongoDB replica set
A MongoDB replica set with two logical dbs:
1. Event store where we would store events2. Reporting DB where we would store
aggregates and final reports
Anatomy of an event 1/2{ '_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9', 'type': 'an-event-type', 'data': { 'meta' : { … }, 'payload' : { … } }}
Anatomy of an event 2/2'meta' : { 'creation_date': ISODate("2014-21-11T00:00:01Z"), 'saved_date': ISODate("2014-21-11T00:00:02Z"), 'source': 'some-bounded-context', 'correlation_id': 'a-correlation-id'},'payload' : { 'user_id': '1234', 'animal': 'unicorn', 'colour': 'pink', 'purchase_date': ISODate("2014-21-11T00:00:00Z"), 'price': '20/fantaeuros'}
Don’t trust the network: Idempotence{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
…
}
The _id field is actually defined client side and ensures idempotence if an event is received two times
Indexes
● Events collection is huge (~100*N documents)
● Use indexes wisely as they are necessary yet expensive
● With suggested event structure:{‘data.meta.created_at’: 1, type:1}
Benchmarking
How many events/second can you store?
Our machines were able to store roughly 150 events/sec. This number can be greatly increased with dedicated IOPS, more aggressive inserting policies, etc...
Final tips
● Use SSD on your storage machines
● Pay attention to write concerns (w=majority)
● Test your replica set fault tolerance
From eventsto meaningful metrics
Sequential Projector -> Event Mapper -> Projection -> Aggregation
The event processing pipeline
A real life problem
What is the conversion rate of our registered users?
#1 The registration event{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
'type': 'user-registered',
'data': {
'meta' : {
'save_date': ISODate("2014-21-11T00:00:09Z"),
'created_at': ISODate("2014-21-11T00:00:01Z"),
'source': 'core-domain',
'correlation_id': 'user-123456'
},
'payload' : {
'user_id': 123,
'username': 'flash',
'email': '[email protected]',
'country': 'IT'
}
}
}
#2 The purchase event{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
'type': 'user-purchased',
'data': {
'meta' : {
'save_date': ISODate("2014-21-11T00:10:09Z"),
'created_at': ISODate("2014-21-11T00:10:01Z"),
'source': 'payment-gateway',
'correlation_id': 'user-123456'
},
'payload' : {
'user_id': 123,
'email': '[email protected]',
'amount': 20,
'value': EUR,
'payment': 'credit_card',
'item': 'fluffy cat'
}
}
}
Sequential projector 1/2
[]->[x]->[]->[x]->[]->[]->[]->[]
|--------------| |------------|
|
|
|
|
---> Projector
Divides the stream of events into batches, filters events by type and pass those of interest to the mapper
Sequential projector 2/2
● It’s a good idea to select fixed sizes batches to avoid memory problems when you load your Cursor in memory
● Could be a long-running process selecting events as they arrive in realtime
Event mapper 1/3
Translates event fields to the Read Model domain
Takes an event as input, applies a bunch of logic and will return a list of Read Model fields.
Event mapper 2/3
Input event:user-registered
Output:$output = [
'user_id' => 123, // simply copied
'user_name' => 'flash', // simply copied
'email' => '[email protected]', // simply copied
'registered_at' => "2014-21-11T00:00:01Z" // From the data.meta.created_at event field
];
Event mapper 3/3
Input event:user-purchased
Output:$output = [
'user_id' => 123, // simply copied
'email' => '[email protected]', // simply copied
'purchased_at': "2014-21-11T00:10:01Z" // From the data.meta.created_at event field
];
Projection
Essentially it is your read model.The data that the business is interested in.
The Projection after event #1
db.users_conversion_rate_projection.findOne()
{
'user_id': 123,
'user_name': 'flash',
'email': '[email protected]',
'registered_at': "2014-21-11T00:00:01Z"
}
The Projection after event #2
{
'user_id': 123,
'user_name': 'flash',
'email': '[email protected]',
'registered_at': "2014-21-11T00:00:01Z"
'purchased_at': "2014-21-11" // Added this field and rewrote others
}
The Projection collection{
'user_id': 123,
'user_name': 'flash',
'email': '[email protected]',
'registered_at': "2014-21-11",
'purchased_at': "2014-21-11" // Added this field and rewrote others
}
{
'user_id': 456,
'user_name': 'batman',
'email': '[email protected]',
'registered_at': "2014-21-11",
'purchased_at': "2014-21-11" // Added this field and rewrote others
}
{
'user_id': 789,
'user_name': 'superman',
'email': '[email protected]',
'registered_at': "2014-21-12",
'purchased_at': "2014-21-12" // Added this field and rewrote others
}
The Projection - A few thoughts
Note that we didn't copy from events to projection all the available fields. Just relevant ones.
From these two events we could have generated infinite read models such as
● List all purchased products and related amounts for the company buyers
● Map all sales and revenues for our accounting dept
● List transactions for the financial department
One way to write,infinite ways to read!
The aggregation (1) - Total registered users
var registered = db.users_conversion_rate_projection.aggregate([
{
$match: {
"registered_at": { $gte: ISODate("2015-11-21"), $lte: ISODate("2015-11-22") }
}
},
{
$group: {
_id: { },
count: { $sum:1 }
}
}
]);
The aggregation (2) - User with a purchase
var purchased = db.users_conversion_rate_projection.aggregate([
{
$match: {
"registered_at": { $gte: ISODate("2015-11-21"), $lte: ISODate("2015-11-22") },
"purchased_at": { $exists: true }
}
},
{
$group: {
_id: { },
count: { $sum:1 }
}
}
]);
The aggregation (3) - Automate all the things
● You can easily create the aggregation framework statement by composition abstracting the concept of Column.
● This way you can dynamically aggregate your projections on (for example) an API requests.
● If your Projector is a long running process, your projections will be updated to the second and you automagically get realtime data.
Another events usage:Business & Tech Monitoring
Beware of the beast!No Silver Bullet
Events are expensiveThey require a lot of TIME to be parsed
Events are expensiveYou will end up with this billion size collection
(and counting)
Fixing wrong events is painful
Events are complex
Moving around events is horribly painful
Actually it will make your life incredibly difficult with hidden bugs and leaking
documentation.
Mongo won’t help you
Improvements
● Upgrade from MongoDB 2.2.x to 3.0.x● Switch to WiredTiger storage engine to save
space
Credits
Based on a talk by Jacopo Nardiello
● Slides: http://bit.ly/es-nardiello-2014 ● Video: https://vimeo.com/113370688
Q&A
@SbiellONE — about.bellettini.me
Thank you!