webinar: schema patterns and your storage engine
TRANSCRIPT
RSVP: mongodb.com/events
What’s coming in 3.4?San Francisco | 11/1
Schema Patterns and Your Storage Engine
Agenda
Schema Design 101
MMAPv1
Wired Tiger
Examples
Update Process
https://farm3.staticflickr.com/2001/2087134188_98125a9702_z.jpg
Why do we have so some many options?
MMAP V1WT 3rd Party
Available 3.2 Your own?
In-memoryEncrypted
Example of Document Definition
• App1
- Realtime dashboard- Ad hoc queries- User profiles
• App2
- Heavy writes batch process- Analytics workload - Multi-tenant application
Schema Design 101
Why is Schema Design Important ?
Why is Schema Design Important ?
• Defines our applications interactions
• How we defined data
• Can considerably impact performance
• Impact DBA’s sleep!
{ first_name: ‘Paul’,surname: ‘Miller’,
cell: 447557505611, city: ‘London’,location: [45.123,47.232],Profession: [banking, finance, trader],cars: [
{ model: ‘Bentley’,year: 1973,
value: 100000, … },{ model: ‘Rolls Royce’,year: 1965,value: 330000, … }
]}
Fields can contain an array of sub-documents
Typed field values
Fields can contain arrays
Document Model
class Item(Object):
def __init__(self, name, car ,date):self.name = nameself.car = carself.date = date
class Car(Object):
def __init__(self, brand, manufactor, date):
self.brand = brandself.manufactor = manufactorself.date = date
Natural RepresentationFlexible Schema Aligned with
Development
{ _id: 1,'greetings':'hello' }
{ _id: 2,'k':'greetings','v': 'hello' }
{ _id: 1,'k':'greetings','v': 'hello' }
{ _id: 2,'car':{ 'manufactor': 'somecarmaker',
'brand': 'somebrand','date': ISODate("2016-06-27")
}'date': ISODate("2016-06-27")}
What to Consider
Change Rate ConcurrencyData Structure Data Lifecycle
What to Consider
Data Structure
users accounts reviews products
What to Consider
Data Structure
accounts
users
products
reviews
What to Consider
Data Structure
users accounts reviews products
What to Consider
Data Structure
Client1 Client2 Client3 Client4
What to Consider{
_id: 1,model : "ford"year: 2014, picture: BinData(2, "afGGAF677..."),
}
{ _id: 1,model : "ford"date: {year: 2014, month: 5, day: 1},picture: "AAFF123BEFC...",
}
{ _id: 1,model : ObjectId("21213312"),date: ISODate("20140501"),picture: "AAFF123BEFC...",
}
Change Rate
What to Consider{
_id: 1,model : "ford"year: 2014, picture: BinData(2, "afGGAF677..."),
}
{ _id: 1,model : "ford"date: {year: 2014, month: 5, day: 1},picture: "AAFF123BEFC...",
}
{ _id: 1,model : ObjectId("21213312"),date: ISODate("20140501"),picture: "AAFF123BEFC...",
}
Different data types
Change Rate
What to Consider{
_id: 1,model : "ford"year: 2014, picture: BinData(2, "afGGAF677..."),
}
{ _id: 1,model : "ford"date: {year: 2014, month: 5, day: 1},picture: "AAFF123BEFC...",
}
{ _id: 1,model : ObjectId("21213312"),date: ISODate("20140501"),picture: "AAFF123BEFC...",
}
Different field structure
Change Rate
What to Consider{
"product_id": "b872ad6e34f87102cf866fead4e10e29","purchases":{"2016": {
"06": 2823,"05": 5535,..."total": 1222312
},"2015": {...},"2014": {...},"2013": {...},
}}
Data Lifecycle
What to Consider{
"product_id": "b872ad6e34f87102cf866fead4e10e29","purchases":{"2016": {
"06": 2823,"05": 5535,..."total": 1222312
},"2015": {...},"2014": {...},"2013": {...},
}}
What we actually read and write
Data Lifecycle
What to Consider{
"product_id": "b872ad6e34f87102cf866fead4e10e29","purchases":{"2016": {
"06": 2823,"05": 5535,..."total": 1222312
},"2015": {...},"2014": {...},"2013": {...},
}}
What we don't anymore
Data Lifecycle
What to Consider
Concurrency
What to Consider
Concurrency
What to Consider
Concurrency
What to Consider
Concurrency
MMAPv1
Disclaimer!
WiredTiger is our default Storage Engine - 3.2 onwards
MMAPv1 / Basics
• Data is Mapped into virtual Memory for Fast access
• Documents pointers are request per access• If in Memory = fast• If not = Disk seek
• Indexes follow the same structure
• Allocation based on Database per file
db.collection.update({a: 1},{ $set: {
b: 1,c: {$inc:10},d: {$push: ["hello"]
})
$set Operator
MMAPv1 / Schema Design Best Practices
db.collection.insert({_id: 1,name: 'Norberto',parking_tickets: {},eat_outs:[
null,null,null,null
]}
)
Document Pre-allocation
db.collection.update({a: 1},{ $set: {
b: 1,c: {$inc:10},d: {$push: ["hello"]
})
$set Operator
MMAPv1 / Schema Design Best Practices
db.collection.insert({_id: 1,name: 'Norberto',parking_tickets: {},eat_outs:[
null,null,null,null
]}
)
Document Pre-allocation
db.collection.update({a: 1},{ $set: {
b: 1,c: {$inc:10},d: {$push: ["hello"]
})
$set Operator
db.movies.update({_id: "b872ad6e34"},{$push:{
reviews: {$each:
["great", "5*","awful"],$slice: 10
}}})
Keep Documents Small
MMAPv1 / Schema Design Best Practices
Wired Tiger
WiredTiger / Basics
• MVCC Storage Engine
• Compression
• Document Level Concurrency Control
• Better resource allocation
WiredTiger Engine
Schema &Cursors
Python API C API Java API
Database Files
Transactions
Pageread/write
Logging
Column storage
Block management
Rowstorage Snapshots
Log Files
Cache
WiredTiger/ Schema Design Best Practices
Cache Size
WiredTiger/ Schema Design Operational Best Practices
Disk
RAM
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS
Document LevelConcurrencyControl
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Collection
Collection
Document LevelConcurrencyControl
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Doc1Doc2Doc3Doc4Doc5Doc6DocX
Document LevelConcurrencyControl
MMAPv1/ Operational Best Practices
mongod
Collection1 doc1
Collection2 doc2
Collection2 doc3
Collection4 doc4
Collection5 doc5
Collection6 doc6
Collection7 doc7
CollectionXdocX
Document LevelConcurrencyControl
MMAPv1/ Operational Best Practices
mongod
Collection1 doc1
Collection2 doc2
Collection2 doc3
Collection4 doc4
Collection5 doc5
Collection6 doc6
Collection7 doc7
CollectionXdocX
Document LevelConcurrencyControl
Cache Size Compression
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Collection
Collection
Doc1Doc2Doc3Doc4Doc5Doc6DocX
RAM{"message": "hello
MongoDB World","user": "Norberto","channel": "twitter","comments": [
{"text":"howdy","user":"Ross"}]
}
Document LevelConcurrencyControl
Cache Size Compression
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Collection
Collection
Doc1Doc2Doc3Doc4Doc5Doc6DocX
RAM
gzip(0x0012314001222321)
WiredTiger/ Schema Design - Compression
{"product_id": "b872ad6e34","name": "MongoDB Atlas","company": "MongoDB","description": "MongoDBAAS","comments": [{"text":"Beautiful"}
]}
WiredTiger/ Schema Design - Compression
{"product_id": "b872ad6e34","name": "MongoDB Atlas","company": "MongoDB","description": "MongoDBAAS","comments": [{"text":"Beautiful"}
]}
MMAPv1{
"pid": "b872ad6e34","n": "MongoDB Atlas","c": "MongoDB","d": "MongoDBAAS","cs": [{"t":"Beautiful"}
]}
WiredTiger/ Schema Design - Compression
{"product_id": "b872ad6e34","name": "MongoDB Atlas","company": "MongoDB","description": "MongoDBAAS","comments": [{"text":"Beautiful"}
]}
WiredTiger
WiredTiger/ Schema Design – Index Prefix Compression
{"user": "Norberto","country": "Portugal","last_comment": ”European Champions!!!",
}
WiredTiger/ Schema Design – Index Prefix Compression
{"user": "Norberto","country": "Portugal","last_comment": "Iceland beat England :-)",
}
db.users.createIndex( {"country": 1} )
Australia-Zimbabwe
A-N N-Z
N-Q Q-Z…
record_id: 0x12311
WiredTiger/ Schema Design – Index Prefix Compression
{"user": "Norberto","country": "Portugal","last_comment": "Iceland beat England :-)",
}
db.users.createIndex( {"country": 1} )
Australia-Zimbabwe
A-N N-Z
N-Q Q-Z…
{key: "Por",record_id: 0x12311
}
record_id: 0x12311
{key: "Por",record_id: 0x123BB
}
{key: "Pol",record_id: 0xF23CB
}
Picking a Storage Engine Matters ?
In-place UpdatesMemory
{country: "Portugal",star: "Cristiano"
}
{country: "Iceland",star: "Hannes Magnusson"
}
{country: "Netherlands",star: "Arthur Viegers"
}
{country: "Portugal",star: "Crestiano"
}
In-place UpdatesMemory
{country: "Portugal",star: "Cristiano"
}
{country: "Iceland",star: "Hannes Magnusson"
}
{country: "Netherlands",star: "Arthur Viegers"
}
db.teams.update({country: "Portugal"},{"$set": {
"star": "Cristiano Ronaldo"}})
{country: "Portugal",star: "Cristiano Ronaldo"
}
MMAPv1
In-place UpdatesMemory
{country: "Portugal",star: "Crestiano"
}
db.teams.update({country: "Portugal"},{"$set": {
"star": "Cristiano Ronaldo"}})
{country: "Portugal",star: "Cristiano Ronaldo"
}
WiredTiger
version 1
version 2
In-place UpdatesMemory
{country: "Portugal",star: "Crestiano"
}
{country: "Portugal",star: "Cristiano Ronaldo"
}
WiredTiger
version 1version 2
Insert Heavyserver
db.pool.insert({ "subject": "Euro 2016","winner": "Portugal"
})
mongod
Insert HeavyServer 1mongod
db.pool.insert({ "subject": "Euro 2016","winner": "Portugal"
})
MMAPv1
Server 2mongod
Server 3mongod
Sharding
Insert Heavyserver
mongod
db.pool.insert({ "subject": "Euro 2016","winner": "Portugal"
})
mongod
mongod
MMAPv1
Micro-Sharding
https://www.mongodb.com/blog/post/mongodb-single-platform-all-financial-data-ahl
Insert Heavyserver
db.pool.insert({ "subject": "Euro 2016","winner": "Portuaal"
})
mongod
WiredTiger
CPU availability
Insert Heavyserver
mongod
db.pool.insert({ "subject": "Euro 2016","winner": "Portuaal"
})
WiredTiger
servermongod
servermongod
Sharding
Buckets {"product_id": "b872ad6e34f87...","visits":{"store1": {
"Jan_31": 2823,"Jan_30": 5535,..."total": 1222312
},"store_2": {
"Jan_31": 2823,"Jan_28": 5535,..."total": 1222312
},}
}
Buckets {"product_id": "b872ad6e34f87...","visits":{"store1": {
"Jan_31": 2823,"Jan_30": 5535,..."total": 1222312
},"store_2": {
"Jan_31": 2823,"Jan_28": 5535,..."total": 1222312
},}
}
db.visits.update({"product_id": "b872ad6e34f87...."},{"$inc": {"store_2.March_19": 10, "store_1.April_1": 68,"stores.total": 78
})
Buckets {"product_id": "b872ad6e34f87...","visits":{"store1": {
"Jan_31": 2823,"Jan_30": 5535,"April_1": 10,"total": 1222312
},"store_2": {
"Jan_31": 2823,"Jan_28": 5535,"March_19": 68,"total": 1222312
},}
}
In-place update
MMAPv1
Buckets{
"product_id": "b872ad6e34f87...","visits":{"store1": {
"Jan_31": 2823,"Jan_30": 5535,..."total": 1222312
},"store_2": {
"Jan_31": 2823,"Jan_28": 5535,..."total": 1222312
},}
}
New version
{"product_id": "b872ad6e34f87...","visits":{"store1": {
"Jan_31": 2823,"Jan_30": 5610,"April_1": 10,"total": 1222312
},"store_2": {
"Jan_31": 2823,"Jan_28": 5545,"March_19": 68,"total": 12229032
},}
}
WiredTiger
Moving to WiredTiger
What to Look For
Bucketing Read/Write RatioIn-place Update
Update Process
• Apply changes to your schema design • Check if there's any performance regression
• Make sure you have CPU resources available• Swap secondary nodes• Swap primary • Enjoy the increased performance!
Example of Document Definition
• App1
- Realtime dashboard- Ad hoc queries- User profiles
• App2
- Heavy writes batch process- Analytics workload - Multi-tenant application
Example of Document Definition
• App1
- Realtime dashboard- Ad hoc queries- User profiles
• App2
- Heavy writes batch process- Analytics workload - Multi-tenant application
WiredTiger or MMAPv1
Example of Document Definition
• App1
- Realtime dashboard- Ad hoc queries- User profiles
• App2
- Heavy writes batch process- Analytics workload - Multi-tenant application
WiredTiger or MMAPv1 WiredTiger
Market Size
$36 Billion
Partners
1,000+
International Offices
15
Global Employees
575+
Downloads Worldwide
15,000,000+
Make a GIANT Impactwww.mongodb.com/careers
http://grnh.se/pj10su
https://university.mongodb.com/courses/M310/aboutM310: MongoDB Security
Feel free to reach out!
Twitter: @[email protected]#mongodbeurope
Obrigado