mongo db improve the performance of your application codemotion2016
TRANSCRIPT
MADRID · NOV 18-19 · 2016
19th November 201618:30-19:15
MongoDB:Improve the performance of your applicationby @juanroycouto
@codemotion_es
1
MADRID · NOV 18-19 · 2016
#whoami
❖ @juanroycouto❖ www.juanroy.es❖ www.mongodbspain.com/en/author/juanroy❖ [email protected]❖ MongoDB DBA at Grupo Undanet
2
MADRID · NOV 18-19 · 2016
#Agenda
3
● Data Model● Application Patterns● .explain()● Indexes● Storage Engines● Query Optimizer
● Aggregation Framework● Profiling● Tuning● Sharding● Storage & Compression
MADRID · NOV 18-19 · 2016
#Data Model
5
● All the information in one document:○ No joins○ Single read○ Higher performance○ ACID compliance at the document level
● MMAPv1: avoid unbounded document growth● WiredTiger rewrites the document for each update
MADRID · NOV 18-19 · 2016
#Application Patterns
6
● Modify only those fields that have changed● Project only the fields you need● Avoid negation in queries● Use Covered Queries when possible● Avoid scatter-gather queries● Write Concern:
○ w : 0 → faster than w : all
MADRID · NOV 18-19 · 2016
#.explain()
7
● Use .explain() to get the time a query needs● Verbose options:
○ queryPlanner (default)○ executionStats (stats for the winner plan)○ allPlansExecution (stats for all candidate
plans)
MADRID · NOV 18-19 · 2016
#.explain()
8
● In-memory sorting data is limited to 32MB (you need an index)
● My index selectivity is the relationship between nReturned and totalKeysExamined
MADRID · NOV 18-19 · 2016
#Indexes
10
● Improve the performance of read-intensive apps
● Overhead:○ Write operations○ Disk usage○ Memory consumption
MADRID · NOV 18-19 · 2016
● Working Set is made of:○ Indexes○ Your most frequently accessed data
#Indexes-Working Set
11
MADRID · NOV 18-19 · 2016
● If your Working Set does not fit into RAM:○ Increment your RAM○ Shard the database○ Avoid scan all documents in the collection
■ Page faults■ You will fill in your Working Set with
infrequent accessed data
#Indexes-Working Set
12
MADRID · NOV 18-19 · 2016
#Indexes-Before creating indexes
13
● Understand your data● How your queries will be● Create as few indexes as possible● Put your highly selective fields first in the index● Check your current indexes before creating new
ones
MADRID · NOV 18-19 · 2016
#Indexes-Advices
14
● Do not index a little cardinality field● Check your index performance● Monitor your system to check if:
○ Your strategy is still up to date○ You need new indexes○ You can remove unnecessary indexes
MADRID · NOV 18-19 · 2016
#Indexes-Compound
15
● A compound index has better performance than:○ Single indexes○ Intersection
● Remove indexes that are prefixes of other● The value of the combined fields must have high
cardinality
MADRID · NOV 18-19 · 2016
#Indexes-Compound
16
>db.collection.createIndex( { equality, sort, range } );
>db.collection.find({ book_author : ‘John’, #equality book_id : { $gt : 54 } #range} ).sort( { book_price : 1 } ); #sort
MADRID · NOV 18-19 · 2016
> use test> db.example.createIndex( { a : 1, b : 1 } );> db.example.explain("executionStats").find( { a : 56, b : 87 } );
"nReturned" : 100,"executionTimeMillis" : 2,"totalKeysExamined" : 100,"totalDocsExamined" : 100,
> db.example.dropIndex( { a : 1, b : 1 } );> db.example.createIndex( { a : 1 } );> db.example.createIndex( { b : 1 } );> db.example.explain("executionStats").find( { a : 56, b : 87 } );
"nReturned" : 100,"executionTimeMillis" : 36,"totalKeysExamined" : 10000,"totalDocsExamined" : 10000
#Indexes-Compound example
17
MADRID · NOV 18-19 · 2016
#Indexes-Covered Queries
18
● They return results from the indexes● A covered query does not read from the
document level:○ ndocsExamined : 0
● In the index must be all the fields:○ Included in the query○ Returned by the query
MADRID · NOV 18-19 · 2016
#Indexes-Covered Queries example
19
> use test> db.example.createIndex( { a : 1, b : 1 } );> db.example.explain("executionStats").find( { a : 56, b : 87 } );
"nReturned" : 100,"executionTimeMillis" : 3,"totalKeysExamined" : 100,"totalDocsExamined" : 100
> db.example.explain("executionStats").find( { a : 56, b : 87 }, { _id : 0, a : 1, b : 1 } );
"nReturned" : 100,"executionTimeMillis" : 3,"totalKeysExamined" : 100,"totalDocsExamined" : 0
MADRID · NOV 18-19 · 2016
#Indexes-TTL
20
● TTL (TimeToLive) Indexes○ Allow the user to specify a period of time after
which the data will be automatically deleted
MADRID · NOV 18-19 · 2016
#Indexes-TTL example
21
> use test> db.timetolive.createIndex( { date : 1 } , { expireAfterSeconds: 20 } );> db.timetolive.insert( { date : new Date() } );> db.timetolive.find().pretty(){
"_id" : ObjectId("581fb43f5626d57bcf66fe73"),"date" : ISODate("2016-11-06T22:52:47.373Z")
}> db.timetolive.count()0>
The background task that removes expired documents runs every 60 seconds.
MADRID · NOV 18-19 · 2016
#Indexes-Partial
22
● Partial Indexes○ Allow to index a subset of your data by
specifying a filtering expression during the index creation
○ Better query performance while reducing system overhead
MADRID · NOV 18-19 · 2016
#Indexes-Partial example
> use test> db.sysprofile.createIndex( { ns : 1, op : 1 }, { partialFilterExpression : { millis : { $gt : 10000 } } } );> db.sysprofile.explain().find( { ns : 'school2.students', op : 'query' }, { millis : 1 } ); "winningPlan" : { "inputStage" : {
"stage" : "COLLSCAN"> db.sysprofile.explain().find( { ns : 'school2.students', op : 'query', millis : { $gte : 20000 } }, { millis : 1 } ); "winningPlan" : { "inputStage" : {
"stage" : "IXSCAN"
23
MADRID · NOV 18-19 · 2016
#Indexes-Sparse
24
● Sparse Indexes○ Only contain entries for documents that
contain a specific field○ Smaller and more efficient indexes
MADRID · NOV 18-19 · 2016
> use test> db.test.createIndex( { b : 1 } , { sparse : true } );> db.test.stats()16384>
> db.test.createIndex( { b : 1 } );> db.test.stats()2580480>
#Indexes-Sparse example
25
MADRID · NOV 18-19 · 2016
#Storage Engines-WiredTiger
26
● Reduced storage space and higher I/O scalability● It compresses indexes into RAM, freeing up
Working Set for documents● Performance (granular concurrency control)● Native compression:
○ per collection○ per index
MADRID · NOV 18-19 · 2016
#Storage Engines-WiredTiger
27
● Types of compression:○ Snappy (default): Data and Journal○ zlib: maximum compression○ Prefix: for indexes
MADRID · NOV 18-19 · 2016
#Storage Engines: In-Memory
28
● Extreme performance coupled with real time analytics for the most demanding, latency-sensitive applications
● Delivers the extreme throughput and predictable latency
● Eliminates the need for separate caching layers
MADRID · NOV 18-19 · 2016
#Query Optimizer
29
● It selects the best index to use● It periodically runs alternate query plans and
selects the index with the best response time for each query type
● The results of these tests are stored as a cached query plan and are updated periodically
MADRID · NOV 18-19 · 2016
#Aggregation Framework
31
● Use $indexStats to determine how frequently your index is used
● Each shard has its own statistics● Stats are reset when you restart your mongod
service>db.collection.aggregate([ { $indexStats : {} }
])
MADRID · NOV 18-19 · 2016
#Aggregation Framework example
32
> db.test.aggregate([ { $indexStats : {} } ]).pretty(){
"name" : "b_1","key" : {
"b" : 1},"host" : "hostname:27017","accesses" : {
"ops" : NumberLong(4),"since" : ISODate("2016-11-07T00:05:39.771Z")
}}...
MADRID · NOV 18-19 · 2016
#Tools-Compass
34
● MongoDB Compass○ Analyze the size and usage of the indexes○ It offers visual plan explanations○ Index creation○ Real time stats○ Identify and fix performance and data problems
MADRID · NOV 18-19 · 2016
#Tools-Ops Manager
35
● The Visual Query Profiler allows to analyze a query performance and recommends on the addition of indexes
● It can be used to visualize output from the profiler when identifying slow queries
MADRID · NOV 18-19 · 2016
#Tools-Profiling
37
● Guess the slowest queries for improving their indexes
● Log information for all events or only those whose duration exceeds a configurable threshold (100ms default)
MADRID · NOV 18-19 · 2016
#Tools-Profiling example
38
> db.setProfilingLevel(2){ "was" : 0, "slowms" : 100, "ok" : 1 }> db.example.find( { c : 98 } );> db.system.profile.find().pretty(){
"op" : "query","ns" : "test.example","query" : {
"find" : "example","filter" : {
"c" : 98}
},…"millis" : 6
MADRID · NOV 18-19 · 2016
#Tuning-SSD
39
● Most disk access patterns are not sequential● Data files benefit from SSDs● MongoDB journal files have high sequential write
patterns (good for spinning disks)● RAM/SSDs better than spinning drives● Use high performance storage● Avoid networked storage
MADRID · NOV 18-19 · 2016
#Tuning-CPU
40
● MongoDB has better performance on faster CPUs● The MongoDB WiredTiger Storage Engine is better
able to saturate multi-core processor resources than the MMAPv1 Storage Engine
MADRID · NOV 18-19 · 2016
#Tuning-Thread
41
● The WiredTiger Storage Engine is multi-threaded and can take advantage of many CPU Cores
MADRID · NOV 18-19 · 2016
#Sharding
42
● The balancer has been moved from the mongos to the primary member of the config server
● Sharding improves the performance of reads● Benefits (even with a ‘poor’ shard key and
scatter-gather):○ A larger amount of memory to store indexes○ Parallelize queries across shards reducing latency
MADRID · NOV 18-19 · 2016
#Storage & Compression types
43
● Balance query latency with storage density and cost by assigning data sets based on a value such as a timestamp to specific storage devices
MADRID · NOV 18-19 · 2016
#Storage & Compression types
44
● Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression
● Older, less frequently accessed data is tagged to lower-throughput hard disk drives compressed with zlib to attain maximum storage density with a lower cost-per-bit
● As data ages, MongoDB will automatically migrates it between shards
MongoDB:Improve the performance of your applicationby @juanroycouto
MADRID · NOV 18-19 · 201646
Thank you for your attention!