high performance nosql with mongodb - sdd 2016sddconf.com/brands/sdd/library/high-performance... ·...

Post on 20-May-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

High Performance NoSQL with MongoDB

Michael Kennedy | @mkennedy | michaelckennedy.net

History of NoSQLJune 11th, 2009, San Francisco, USA

Johan Oskarsson (from http://last.fm/) organized a meetup to discuss advances in data storage which were all using distributed databases leveraging clusters. He asked the group for a short term they could use as a hashtag. [1]

Eric Evans (not of DDD fame) proposed #NoSQL and it stuck.

Michael Kennedy | @mkennedy | michaelckennedy.net

Michael's NoSQL Definition

Database systems which are cluster-friendly and which trade inter-entity relationships for both simplicity and performance.

Michael Kennedy | @mkennedy | michaelckennedy.net

Four types of "NoSQL" DBs• Key Value Stores

– Amazon DynamoDB– Redis

• Column-Oriented databases– Hbase– Cassandra– Google BigQuery

• Graph Databases– Neo4J– OrientDB

• Document Databases– MongoDB– CouchDB– DocumentDB (on Azure)

Michael Kennedy | @mkennedy | michaelckennedy.net

Key-value data storage

Michael Kennedy | @mkennedy | michaelckennedy.net

Column Oriented DBs

Michael Kennedy | @mkennedy | michaelckennedy.net

Graph DBs

Michael Kennedy | @mkennedy | michaelckennedy.net

Document DBs

Michael Kennedy | @mkennedy | michaelckennedy.net

Not so different

Michael Kennedy | @mkennedy | michaelckennedy.net

How much do you need perf?

Image credit: nerovivo

Michael Kennedy | @mkennedy | michaelckennedy.net

Relational 3NF models are complex

Michael Kennedy | @mkennedy | michaelckennedy.net

Document DBs for simplicity

Document db style

Michael Kennedy | @mkennedy | michaelckennedy.net

Document DBs for simplicity

Document db style

Michael Kennedy | @mkennedy | michaelckennedy.net

Single server performance

Single biggest performance problem (and fix)?

Incorrect indexes(too few or too many)

Michael Kennedy | @mkennedy | michaelckennedy.net

• Be data-driven: profile and then add indexesAdding indexes

Michael Kennedy | @mkennedy | michaelckennedy.net

• Indexes are more important than for RDBMSes

Adding indexes

Michael Kennedy | @mkennedy | michaelckennedy.net

Demo time

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 1: Enable profiling

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 2: Run common queries

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 3: Analyze system.profile

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 4: Add indexes for slow

Michael Kennedy | @mkennedy | michaelckennedy.net

Step 5: GOTO 1

Michael Kennedy | @mkennedy | michaelckennedy.net

Scaling out

Image credit: johnantoniImage credit: Torkild Retvedt

Michael Kennedy | @mkennedy | michaelckennedy.net

Scaling out• Scale-out is the great promise of NoSQL• MongoDB has two modes of scale out

– Sharding – Replication

Real-word statistics from one company

120,000 DB operations / second2GB of app-to-db I/O / second

Michael Kennedy | @mkennedy | michaelckennedy.net

Replication vs. scalability• Sharding is the primary way to improve single query speed• Replication is not the primary way to scale

– even though you may get better read performance, not much better write performance unless very read heavy

Server 1A-B-C-D-E

Server 4A-B-C-D-E

Server 2A-B-C-D-E

Server 3A-B-C-D-E

Server 5A-B-C-D-E

Server 1A

Server 4D

Server 2B

Server 3C

Server 5E

Replication Sharding

Michael Kennedy | @mkennedy | michaelckennedy.net

Sharding

...

Michael Kennedy | @mkennedy | michaelckennedy.net

Weather data from the entire 20th century in MongoDBCase study by MongoDB Inc:

http://www.mongodb.com/presentations/weather-century-part-2-high-performance

Scaling via Sharding – an example

Michael Kennedy | @mkennedy | michaelckennedy.net

• 2.5billiondatapoints• 4Terabyte(1.6kperdocument)

Data size and quantity

Michael Kennedy | @mkennedy | michaelckennedy.net

{ "st" : "u725053","ts" : ISODate("2013-06-03T22:51:00Z"),"airTemperature" : {

"value" : 21.1,"quality" : "5"

},"atmosphericPressure" : {

"value" : 1009.7,"quality" : "5"

}}

Sample record (JSON)

Michael Kennedy | @mkennedy | michaelckennedy.net

class WeatherRecord{

public string st {get; set;}public DateTime ts {get; set;}public Temp airTemperature {get; set;}public Pressure atmosphericPressure {get; set;}

}

class Temp{

public int value {get; set;}public string quality {get; set;}

}class Pressure{

public int value {get; set;}public string quality {get; set;}

}

Sample record in C#

Michael Kennedy | @mkennedy | michaelckennedy.net

Asingleserverwithareallybigdisk

Application mongod

i2.8xlarge251 GB RAM

6 TB SSD

c3.8xlarge

Scale Up

Michael Kennedy | @mkennedy | michaelckennedy.net

AreallybigclusterwhereeverythingisinRAM

Application / mongos

...100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

c3.8xlarge

Scale out configuration

Michael Kennedy | @mkennedy | michaelckennedy.net

AreallybigclusterwhereeverythingisinRAM

Application / mongos

...100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

Can scale even more

Michael Kennedy | @mkennedy | michaelckennedy.net

...

$60,000 / yr

$700,000 / yr

Cost per year in AWS?

Michael Kennedy | @mkennedy | michaelckennedy.net

0

0.5

1

1.5

2

single server cluster

ms avg

95th

99th

max. throughput: 40,000/s 610,000/s

(10 mongos)

db.data.find({"st" : "u747940","ts" : ISODate("1969-07-16T12:00:00Z")})

Performance: single time and place

Michael Kennedy | @mkennedy | michaelckennedy.net

0

1000

2000

3000

4000

5000

single server cluster

ms avg

95th

99th

max.throughput: 20/s 430/s

(10 mongos)

targeted query

db.data.find({"st" : "u747940","ts" : {"$gte": ISODate("1989-01-01"),

"$lt" : ISODate("1990-01-01")}})

Performance: 1 year's weather

Michael Kennedy | @mkennedy | michaelckennedy.net

61.8 °C = 143 °F

2 minCluster

4 h 45 minSingle Server

142x faster

db.data.aggregate([{ "$match" : { "airTemperature.quality" :

{ "$in" : [ "1", "5" ] } } },

{ "$group" : { "_id" : null,"maxTemp" : { "$max" :

"$airTemperature.value" } } }])

Analytics

Michael Kennedy | @mkennedy | michaelckennedy.net

Get the code and data

https://github.com/mikeckennedy/sdd2016

Michael Kennedy | @mkennedy | michaelckennedy.net

talkpython.fm

Want to go deeper?

training.talkpython.fm

michaelckennedy.netmikeckennedy@gmail.com

@mkennedy

top related