agility and scalability with mongodb

41
MongoDB Scalability and Agility [email protected]

Upload: mongodb

Post on 29-Nov-2014

372 views

Category:

Technology


1 download

DESCRIPTION

MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.

TRANSCRIPT

Page 1: Agility and Scalability with MongoDB

MongoDB Scalability and Agility

[email protected]

Page 2: Agility and Scalability with MongoDB

2

• Now

• Secure

• All varieties

• Fast and interactive

• Scalable to “Big”

• Agile to develop and deploy operationally

• Cloud and edge

Data Challenge“I want my data...”

iStock licensed (pixelfit)

Page 3: Agility and Scalability with MongoDB

3

Scalability with MongoDB

Metric Meaning Examples

Operations per Second

Concurrent reads and writes per second

> 1 Million per second

Nodes per Cluster

Horizontal scale-out, distributed to multiple data centers worldwide, with high availability, using inexpensive cloud resources

> 1000 nodes

Records / Documents

Data objects in any number of schemas or structures

> 10 billion

Data Volume Total amount of data: documents X size

> 1 Petabyte = 10^15 = 1,000,000,000,000,000≈ 2^50

Page 4: Agility and Scalability with MongoDB

Key Differentiation

Page 5: Agility and Scalability with MongoDB

5

Operational Database Landscape

Page 6: Agility and Scalability with MongoDB

6

Document Data Model

Relational MongoDB

{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Page 7: Agility and Scalability with MongoDB

7

Documents are Rich Data Structures

{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Geo-

Coordinate

s

Page 8: Agility and Scalability with MongoDB

8

Document Model Benefits

• Agility and flexibility– Data model supports business change– Rapidly iterate to meet new requirements

• Intuitive, natural data representation– Eliminates ORM layer– Developers are more productive

• Reduces the need for joins, disk seeks– Programming is more simple– Performance delivered at scale

Page 9: Agility and Scalability with MongoDB

11

Big Data Tech Interest Comparison

j.mp/Ssvpev

Page 10: Agility and Scalability with MongoDB

12

Enterprise Adoption Comparison

bit.ly/1vAI7rF

Page 11: Agility and Scalability with MongoDB

Architecture for Availability & Scalability

Page 12: Agility and Scalability with MongoDB

14

Replica Sets

• Replica Set – two or more copies

• Availability solution– High Availability

– Disaster Recovery

– Maintenance

• Deployment Flexibility– Data locality to users

– Workload isolation: operational & analytics

• Self-healing shard

Primary

Driver

Application

Secondary

Secondary

Replication

Page 13: Agility and Scalability with MongoDB

16

Global Data Distribution

Real-time

Real-time Real-time

Real-time

Real-time

Real-time

Real-time

Primary

Secondary

Secondary

Secondary

Secondary

Secondary

Secondary

Secondary

Page 14: Agility and Scalability with MongoDB

17

Automatic Sharding

• Sharding types

• Range

• Hash

• Tag-aware

• Elastic increase or decrease in capacity

• Automatic balancing

Page 15: Agility and Scalability with MongoDB

18

Query Routing

• Multiple query optimization models

• Each sharding option appropriate for different apps

Page 16: Agility and Scalability with MongoDB

Performance

Page 17: Agility and Scalability with MongoDB

20

Drag Strip: straight ahead, quarter-mile, stop

Page 18: Agility and Scalability with MongoDB

21

Road Race:stay fast, stay agile, continuous

Nürburgring, Germany

Page 19: Agility and Scalability with MongoDB

MongoDB at Scale

Page 20: Agility and Scalability with MongoDB

24

• Large data set

CarFax

Page 21: Agility and Scalability with MongoDB

25

Baseline MongoDB Comparison Initial Production

• Vehicle History Database

• 11 billion records (growing at 1 billion per year)

• 30-year-old VMS-based RDBMS

• Cumbersome

• Costly

• Performance: 4x faster than baseline, 10x key-value

• Scale out using inexpensive commodity servers

• Built-in redundancy

• Flexible dynamic schema data model

• Strong consistency

• Analytics/aggregation

• MongoDB is primary data store

• 50 servers• 10 shards• 5 node replica sets per

shard

In-depth NoSQL evaluation

Page 22: Agility and Scalability with MongoDB

26

• 13 billion+ documents– 1.5 billion documents added every year

• 1 vehicle history report is > 200 documents

• 12 Shards

• 9-node replica sets

• Replicas distributed across 3 data centers

CARFAX Sharding and Replication

Page 23: Agility and Scalability with MongoDB

27

CARFAX Replication

Page 24: Agility and Scalability with MongoDB

28

Page 25: Agility and Scalability with MongoDB

29

• 50M users.

• 6B check-ins to date (6M per day growth).

• 55M points of interest / venues.

• 1.7M merchants using the platform for marketing

• Operations Per Second: 300,000

• Documents: 5.5B (~16.5B with replication).*

Foursquare

Page 26: Agility and Scalability with MongoDB

30

• 11 MongoDB clusters– 8 are sharded

• Largest cluster for check-ins

• 15 shards (check ins)

• Shard key user_id

Foursquare clusters

Page 27: Agility and Scalability with MongoDB

31

Facebook / parse.com mobile apps

• Persistent database for 270,000 mobile applications

• 200 M end-user mobile devices

• 250% annual growth in client apps

• 500% growth in requests

• 1.5 M collections

• Key differentiators:

– Document data model

– High perf. & avail.

– Geospatial query and index

• Charity Majors operations: j.mp/X3jVRC

– Understand your database and your data, and build for them.

Page 28: Agility and Scalability with MongoDB
Page 29: Agility and Scalability with MongoDB

Scalability Exercises in the Cloud with Amazon Web Services

Page 30: Agility and Scalability with MongoDB

35

• 27x hs1.8xlarge instances

– 16x VCPU

– 24x 2TB SATA drives, RAID0

– 8x mongod microshards

• Modified Yahoo Cloud Serving Benchmark (YCSB)

– Long Integer IDs (>2B)

– Zipfian-distributed integer fields

– Aggregation queries

• Load direct to 216 shards, 10 days, $4K        "objects" : 7,170,648,489,        "avgObjSize" : 147,438.99952658816,        "dataSize" : NumberLong("1,057,240,224,818,640")        (commas added)

Petascale Database

Page 31: Agility and Scalability with MongoDB

CGroup Memory Segregation

for DB in `seq 0 3`; do sudo cgcreate \ -a mongodb:mongodb \ -t mongodb:mongodb \ -g memory:mongodb$D sudo echo 48G > \ /sys/fs/cgroup/memory/mongodb$D/memory.limit_in_bytes cgexec \ -g memory:mongodb$DB \ numactl –interleave=all \ mongod –-config ~/mongod$DB.confdone

Page 32: Agility and Scalability with MongoDB

37

• Ingest 250-byte stock quotes at 2M/s

• Concurrently run 5 QPS, subsecond/indexed response on timeStamp, accountId, instrumentId, systemKey

• 5x r3.4xlarge– 16x VCPU, 1x 320GB SSD, 122GB RAM, 16x mongod

– 2.1M insert/second direct to shards

• 16x c3.8xlarge– 32x VCPU, 2x 320GB SSD, 60GB RAM, 16x mongod, 4x mongos

– 2.1M insert/second via mongos

Megawrite Ingest

Page 33: Agility and Scalability with MongoDB

38

• 2 threads on c3.8xl 

• 264 bsonsize object, _id index only

• coll.insert() 15,600 ins / sec

• coll.insert(List<DBObject>)listsize = 64: 118,000 ins / sec

• Bulk ops APIsize = 64: 120,000 ins / sec

Java API comparison

Page 34: Agility and Scalability with MongoDB

BulkWriteOperation bo = null; for(a = 0; a < this.items && stayAlive; a++) { if(bo == null) { bo = collection.initializeUnorderedBulkOperation(); } fillMap(this.m); BasicDBObject dbObject = new BasicDBObject(this.m); bo.insert(dbObject); if(0 == a % listsize) { BulkWriteResult rc = bo.execute(); bo = null; }} 

7x Load with BulkOp

Page 35: Agility and Scalability with MongoDB

How do I Pick A Shard Key?

Page 36: Agility and Scalability with MongoDB

41

Shard Key characteristics

• A good shard key has:– sufficient cardinality

– distributed writes

– targeted reads ("query isolation")

• Shard key should be in every query if possible– scatter gather otherwise

• Choosing a good shard key is important!– affects performance and scalability

– changing it later is expensive

Page 37: Agility and Scalability with MongoDB

42

Hashed shard key

• Pros:– Evenly distributed writes

• Cons:– Random data (and index) updates can be IO intensive

– Range-based queries turn into scatter gather

Shard 1

mongos

Shard 2 Shard 3 Shard N

Page 38: Agility and Scalability with MongoDB

43

Low cardinality shard key

• Induces "jumbo chunks"

• Examples: boolean field

Shard 1

mongos

Shard 2 Shard 3 Shard N

[ a, b )

Page 39: Agility and Scalability with MongoDB

44

Ascending shard key

• Monotonically increasing shard key values cause "hot spots" on inserts

• Examples: timestamps, _id

Shard 1

mongos

Shard 2 Shard 3 Shard N

[ ISODate(…), $maxKey )

Page 40: Agility and Scalability with MongoDB

Ensuring Success with High Scalability

Page 41: Agility and Scalability with MongoDB

46

Success Factors

• Storage: random seeks (IOPS)

• RAM: working set based on query patterns

• Query: indexing

• Delete: most expensive operation

• Real-time vs. bulk operations

• Continuity: HA, DR, backup, restore

• Agile process: iterate by powers of 4

• Sharding: shard key and strategy

• Resources: don’t go it alone!