high performance nosql with mongodb - sdd 2016sddconf.com/brands/sdd/library/high-performance... ·...

39
High Performance NoSQL with MongoDB

Upload: others

Post on 20-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

High Performance NoSQL with MongoDB

Page 2: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

History of NoSQLJune 11th, 2009, San Francisco, USA

Johan Oskarsson (from http://last.fm/) organized a meetup to discuss advances in data storage which were all using distributed databases leveraging clusters. He asked the group for a short term they could use as a hashtag. [1]

Eric Evans (not of DDD fame) proposed #NoSQL and it stuck.

Page 3: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Michael's NoSQL Definition

Database systems which are cluster-friendly and which trade inter-entity relationships for both simplicity and performance.

Page 4: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Four types of "NoSQL" DBs• Key Value Stores

– Amazon DynamoDB– Redis

• Column-Oriented databases– Hbase– Cassandra– Google BigQuery

• Graph Databases– Neo4J– OrientDB

• Document Databases– MongoDB– CouchDB– DocumentDB (on Azure)

Page 5: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Key-value data storage

Page 6: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Column Oriented DBs

Page 7: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Graph DBs

Page 8: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Document DBs

Page 9: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Not so different

Page 10: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

How much do you need perf?

Image credit: nerovivo

Page 11: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Relational 3NF models are complex

Page 12: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Document DBs for simplicity

Document db style

Page 13: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Document DBs for simplicity

Document db style

Page 14: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Single server performance

Single biggest performance problem (and fix)?

Incorrect indexes(too few or too many)

Page 15: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

• Be data-driven: profile and then add indexesAdding indexes

Page 16: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

• Indexes are more important than for RDBMSes

Adding indexes

Page 17: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Demo time

Page 18: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 1: Enable profiling

Page 19: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 2: Run common queries

Page 20: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 3: Analyze system.profile

Page 21: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 4: Add indexes for slow

Page 22: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 5: GOTO 1

Page 23: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Scaling out

Image credit: johnantoniImage credit: Torkild Retvedt

Page 24: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Scaling out• Scale-out is the great promise of NoSQL• MongoDB has two modes of scale out

– Sharding – Replication

Real-word statistics from one company

120,000 DB operations / second2GB of app-to-db I/O / second

Page 25: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Replication vs. scalability• Sharding is the primary way to improve single query speed• Replication is not the primary way to scale

– even though you may get better read performance, not much better write performance unless very read heavy

Server 1A-B-C-D-E

Server 4A-B-C-D-E

Server 2A-B-C-D-E

Server 3A-B-C-D-E

Server 5A-B-C-D-E

Server 1A

Server 4D

Server 2B

Server 3C

Server 5E

Replication Sharding

Page 26: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Sharding

...

Page 27: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Weather data from the entire 20th century in MongoDBCase study by MongoDB Inc:

http://www.mongodb.com/presentations/weather-century-part-2-high-performance

Scaling via Sharding – an example

Page 28: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

• 2.5billiondatapoints• 4Terabyte(1.6kperdocument)

Data size and quantity

Page 29: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

{ "st" : "u725053","ts" : ISODate("2013-06-03T22:51:00Z"),"airTemperature" : {

"value" : 21.1,"quality" : "5"

},"atmosphericPressure" : {

"value" : 1009.7,"quality" : "5"

}}

Sample record (JSON)

Page 30: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

class WeatherRecord{

public string st {get; set;}public DateTime ts {get; set;}public Temp airTemperature {get; set;}public Pressure atmosphericPressure {get; set;}

}

class Temp{

public int value {get; set;}public string quality {get; set;}

}class Pressure{

public int value {get; set;}public string quality {get; set;}

}

Sample record in C#

Page 31: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Asingleserverwithareallybigdisk

Application mongod

i2.8xlarge251 GB RAM

6 TB SSD

c3.8xlarge

Scale Up

Page 32: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

AreallybigclusterwhereeverythingisinRAM

Application / mongos

...100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

c3.8xlarge

Scale out configuration

Page 33: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

AreallybigclusterwhereeverythingisinRAM

Application / mongos

...100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

Can scale even more

Page 34: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

...

$60,000 / yr

$700,000 / yr

Cost per year in AWS?

Page 35: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

0

0.5

1

1.5

2

single server cluster

ms avg

95th

99th

max. throughput: 40,000/s 610,000/s

(10 mongos)

db.data.find({"st" : "u747940","ts" : ISODate("1969-07-16T12:00:00Z")})

Performance: single time and place

Page 36: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

0

1000

2000

3000

4000

5000

single server cluster

ms avg

95th

99th

max.throughput: 20/s 430/s

(10 mongos)

targeted query

db.data.find({"st" : "u747940","ts" : {"$gte": ISODate("1989-01-01"),

"$lt" : ISODate("1990-01-01")}})

Performance: 1 year's weather

Page 37: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

61.8 °C = 143 °F

2 minCluster

4 h 45 minSingle Server

142x faster

db.data.aggregate([{ "$match" : { "airTemperature.quality" :

{ "$in" : [ "1", "5" ] } } },

{ "$group" : { "_id" : null,"maxTemp" : { "$max" :

"$airTemperature.value" } } }])

Analytics

Page 38: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

Get the code and data

https://github.com/mikeckennedy/sdd2016

Page 39: High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

Michael Kennedy | @mkennedy | michaelckennedy.net

talkpython.fm

Want to go deeper?

training.talkpython.fm

[email protected]

@mkennedy