building an event system on top mongodb

22
BUILDING A MISSION CRITICAL EVENT SYSTEM ON TOP OF MONGODB by @shahar_kedar

Upload: bigpanda

Post on 25-May-2015

202 views

Category:

Software


0 download

DESCRIPTION

How we built a super fast, extremely reliable and highly available event system on top MongoDB

TRANSCRIPT

Page 1: Building an event system on top MongoDB

BUILDING A MISSION CRITICAL EVENT SYSTEM ON TOP OF MONGODB

by @shahar_kedar

Page 2: Building an event system on top MongoDB

BIGPANDASaaS platform that lets companies aggregate alerts from all their monitoring systems into one place for faster incident discovery and response.

Page 3: Building an event system on top MongoDB

HOW IT WORKS

High CPU on prod-srv-1

18/06/14 16:05 CRITICAL

High CPU on prod-srv-1

18/06/14 16:07 WARNING

Memory usage on prod-srv-1

18/06/14 16:08 CRITICAL

Events EntitiesHigh CPU on prod-srv-1 WARNING

Memory usage on prod-srv-1 CRITICAL

Incidents2 Alerts on prod-srv-1

Page 4: Building an event system on top MongoDB

PRODUCT REQUIREMENTS

• Events need to be processed into incidents and streamed to the user’s browser as fast as possible

• Incidents need to reliably reflect the state as it is in the monitoring system

• The service has to be up and running 24x7

Page 5: Building an event system on top MongoDB

MISSION CRITICAL

• It’s not rocket science, it’s not Google, but:

• It has to be super fast

• It has to be extremely reliable

• It has to always be available

Page 6: Building an event system on top MongoDB

OUR #1 COMPETITOR

Page 7: Building an event system on top MongoDB

WHY MONGO?BECAUSE IT’S WEB SCALE!

Page 8: Building an event system on top MongoDB

WHY MONGO?At first:

• NodeJS shop

• Schemaless

• Easy to master

Later on:

• Reliable

• Easy to evolve

• Partial and atomic updates

• Powerful query language

BECAUSE IT’S WEB SCALE!

Page 9: Building an event system on top MongoDB

SUPER FASTHardware

Schema Design

Lean & Stream

Page 10: Building an event system on top MongoDB

HARDWARE

03/13

3 x m1.medium

02/14

1 x i2.xlarge+

2 x m1.medium

m1.medium: 1 vCPUs, 3.75GB RAM, EBS drive

06/14

2 x i2.xlarge+

1 x m3.xlarge

m3.xlarge: 4 vCPUs, 15GB RAM, EBS drive

i2.xlarge: 4 vCPUs, 30.5GB RAM, SSD 800GB

x3 readsx4 writes

Page 11: Building an event system on top MongoDB

–Eliot Horowitz

“Schema design is … the largest factor when it comes to performance and scalability … more important than hardware, how you shard, or anything else,

schema is by far the most important thing.”

Page 12: Building an event system on top MongoDB

SCHEMA DESIGNEvent

{ timestamp : Date status: String description: String, }

Entity{ start : Date end: Date status: String description: String, events: [ <embedded> ] source_system: String }

Incident{ start : Date end: Date is_active: Boolean description: String, entities: [ { entityId: ObjectId status: String } ] }

Page 13: Building an event system on top MongoDB

DENORMALIZATION• Go over the checklist (http://bit.ly/1vUdz2T)

• Incidents => Entities: partially embedded + ref

• Cardinality: one-to-few

• Direct access to Entities

• Entities are frequently updated

• Entities => Events: embedded

• Events are not directly accessed

• Events are immutable

• Cardinality: one-to-many ~ one-to-gazzilion

Page 14: Building an event system on top MongoDB

INDEXES

• Optimized indexes db.collection.find({..}).explain()

• Removed redundant indexes

• Truncated events collections (TTL index)

Page 15: Building an event system on top MongoDB

LEAN QUERIES

• Use projections to limit fields returned by a query:Model.find().select(‘-events’)

• Mongoose users: use .lean() when possible to gain more than 50% performance boost:Model.find().lean()

• Stream results: Model.find().stream().on(‘data’, function(doc){})

Page 16: Building an event system on top MongoDB

RESULTS• Average latency of all API calls went from 500ms

to under 20ms

• Average latency of full pipeline went from 2s to under 500ms

• Peak time latency of full pipeline went down from 5m(!!) to less than 30s

Page 17: Building an event system on top MongoDB

EXTREMELY RELIABLE

Atomic & Partial Updates

Page 18: Building an event system on top MongoDB

ATOMIC & PARTIAL UPDATES• Several services might try to update the same

document at the same time, but:

• Different systems update different parts of the document

• Updates to the same document are sharded and ordered at the application level (read our awesome blog post: http://bit.ly/1nQVcbS)

Page 19: Building an event system on top MongoDB

IMPOSSIBLE TO KILL

Replica Set

Disaster Recovery

Page 20: Building an event system on top MongoDB

REPLICA SET

• 3 nodes replica set

• Using priorities to enforce master election of stronger nodes

• Deployed on different availability zones

Page 21: Building an event system on top MongoDB

DISASTER RECOVERY

• Cold backup using MMS Backup

• Full production replication on another EC2 region: using mongo’s replication mechanism to continuously sync data to the backup region

Page 22: Building an event system on top MongoDB

THANK YOU!