mongodb 101 (session one) - percona · mongodb 101 (session one) art van scheppingen senior support...

Post on 05-Jun-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Confidential

MongoDB 101 (session one)

Art van Scheppingen

Senior Support Engineer - Severalnines AB

Confidential

Who am I?

☐ Senior Support Engineer at Severalnines AB☐ Worked with MySQL for over 16 years☐ Has been in a DBA environment for 6 years☐ Polyglot peristence proponent☐ Organizer of Polyglot Persistence meetups (Serveralnines)

☐ Amsterdam, Berlin, Paris, London, Stockholm and Dublin

Confidential

Who is Severalnines?

☐ Database automation / orchestration☐ Software to deploy, monitor, manage and scale☐ Support for MySQL (all flavours), MongoDB and PostgreSQL☐ Main product: ClusterControl

Confidential

Orchestration System: ClusterControl

☐ ClusterControl☐ http://severalnines.com/getting-started☐ Deploy☐ Monitor☐ Manage☐ Scale

☐ Community edition

Confidential

MongoDB support

Confidential

MongoDB support

Confidential

Logistics

Confidential

Morning / Afternoon sessions

☐ Morning session - Art van Scheppingen☐ Basics☐ Cluster/Schema Design patterns

☐ Afternoon session - David Murphy / Kim☐ Sharding☐ Engines☐ Common Operations issues

Confidential

Agenda for this morning

☐ Part 1: Basics (9:00 - 10:30)☐ MongoDB Primer☐ Running mongod☐ Mongo command basics☐ CRUD: create, read, update, delete

Confidential

Agenda for this morning

☐ Part 3: Essentials (11:00 - 12:00)☐ Aggregate☐ Import / export data☐ Backup / restore☐ Schema design patterns

Confidential

Prerequisites

☐ MongoDB community locally installed☐ https://www.mongodb.com/download-center

☐ Download the zip code data set:☐ http://media.mongodb.org/zips.json

☐ You have to know and understand JSON data structures

Confidential

Prerequisites for this afternoon

☐ Clone the following github repo:☐ https://github.com/dbmurphy/MongoDB32Labs

☐ If you are not running on Linux: run a VM with Linux☐ Recommended install for VMs:

☐ Install VirtualBox ☐ https://www.virtualbox.org/wiki/Downloads

☐ Download Fedora or Ubuntu☐ https://getfedora.org/en/cloud/download/

☐ https://cloud-images.ubuntu.com/xenial/current/

Confidential

Windows users

☐ Windows installations require the Windows powershell☐ Set path: $env:Path += ";c:\Program Files\MongoDB\Server\3.2\bin\"☐ The --fork parameter does not work in the Windows binary

☐ Solution: open multiple command / shell windows and leave MongoDB running in the foreground

Confidential

Basics: MongoDB

Confidential

What is MongoDB?

☐ Document data store☐ Not a key/value store!☐ Data stored in JSON

☐ Philosophy☐ Flexibility☐ Scalability☐ Geo distributed☐ Strong consistency

Confidential

What is MongoDB?

☐ Originally data was only stored as BSON on disk (MMAP)☐ BSON is binary JSON☐ MongoDB v3.0 allows other storage engines

☐ No-SQL?☐ Javascript based query language☐ Similar feature set in MongoDB queries

Confidential

What are MongoDB advantages?

☐ Doesn’t require a lot of memory☐ No preallocated buffer pools (except for WiredTiger)☐ Makes use of the filesystem cache to cache data☐ Indexes are loaded in memory

☐ Allows high levels of concurrency☐ Strong consistency☐ Easy for scaling reads

☐ Scale replicaSets

☐ Easy for scaling writes☐ Scale shards by adding replicaSets

Confidential

What are MongoDB disadvantages

☐ It is a different approach to problems☐ Different solutions

☐ Not ACID compliant☐ Atomicity only on collection (MMAP) or document (WiredTiger)☐ But there is transaction-like semantics

Confidential

Terminology

☐ Database☐ Contains collections

☐ Collections☐ Collection of documents (think of a table)

☐ Document☐ BSON document (may contain links to docs in other collections)☐ BSON is a binary representation of JSON☐ Document size is limited to 16MB (megabytes)

☐ Fields☐ The properties of a BSON object (think of columns)

Confidential

Basics: document example

{

"_id" : ObjectId("57e171765ffbf76ca639bd65"),

"foo" : "bar",

"counter" : NumberLong(10010101),

"array" : [

"one",

"two",

"three"

],

"subbranch" : {

"another" : "json",

"object" : "in here"

}

}

Confidential

Basics: fields

☐ Field names are strings☐ Field names may not start with the dollar sign character ($)

☐ Preserved for (matching) functions and operators

☐ Field names may not contain dot characters (.)☐ Dot notation is used to access arrays and embedded

documents

☐ Field names may not contain a null character☐ Field name “_id” is reserved

Confidential

Basics: accessing array and embedded fields

{

"_id" : ObjectId("57e171765ffbf76ca639bd65"),

"foo" : "bar",

"counter" : NumberLong(10010101),

"array" : [

"one",

"two",

"three"

],

"subbranch" : {

"another" : "json",

"object" : "in here"

}

}

"array.2"

"subbranch.object"

Confidential

Basics: single server

Confidential

Basics: Running mongod

☐ Running mongod in the foregroundmongodmongod --port <port> --host <hostname> --dbpath ~/data/db

☐ Running mongod in the background (fork)mongod --fork

Confidential

Basics: Checkpointing

☐ MongoDB checkpoints every 60 seconds (both MMAP and WiredTiger)

☐ Between checkpoints all modifications are written to the journal (every 100ms)

Confidential

Exercise 1: run mongod on your laptop

☐ Create the data directory and run mongod (Linux/Mac):mkdir plam101mongod --dbpath plam101 --logpath mongo-101.log --fork

☐ Create the data directory and run mongod (Windows):mkdir plam101mongod --dbpath plam101 --logpath mongo-101.log

Confidential

Exercise 1: connect to mongod

☐ Verify that mongod is running (Linux/MacOS):ps ax | grep mongodtail mongo-101.logmongo

☐ Verify that mongod is running (Windows):Get-Process mongod

Get-Content -Path mongo-101.log

mongo

Confidential

Exercise 1: directory layout

$ ls -la plam101/total 163848drwxr-xr-x 8 youruser staff 272 Sep 30 11:48 .drwxr-xr-x 5 youruser staff 170 Sep 30 11:48 ..drwxr-xr-x 2 youruser staff 68 Sep 30 11:48 _tmpdrwxr-xr-x 2 youruser staff 68 Sep 30 11:48 journal-rw------- 1 youruser staff 67108864 Sep 30 11:48 local.0-rw------- 1 youruser staff 16777216 Sep 30 11:48 local.ns-rw-r--r-- 1 youruser staff 0 Sep 30 11:48 mongod.lock-rw-r--r-- 1 youruser staff 69 Sep 30 11:48 storage.bson

Confidential

Creating databases

☐ MongoDB databases are created implicitly when changing to a non existent database and inserting data

> use new_database

switched to db new_database

> show databases

local 0.000GB

> db.somecollection.insert({"foo": "bar"}) WriteResult({ "nInserted" : 1 })

> show databases

local 0.000GB

new_database 0.000GB

Confidential

Creating databases: directory layout

$ ls -la plam101/total 163848drwxr-xr-x 8 youruser staff 272 Sep 30 11:48 .drwxr-xr-x 5 youruser staff 170 Sep 30 11:48 ..drwxr-xr-x 2 youruser staff 68 Sep 30 11:48 _tmpdrwxr-xr-x 2 youruser staff 68 Sep 30 11:48 journal-rw------- 1 youruser staff 67108864 Sep 30 11:48 local.0-rw------- 1 youruser staff 16777216 Sep 30 11:48 local.ns-rw-r--r-- 1 youruser staff 0 Sep 30 11:48 mongod.lock-rw-r--r-- 1 youruser staff 69 Sep 30 11:48 storage.bson-rw------- 1 youruser staff 67108864 Sep 30 11:52 new_database.0-rw------- 1 youruser staff 16777216 Sep 30 11:52 new_database.ns

Confidential

Dropping databases

☐ To drop a database you have to change to the database you want to drop, only then you can drop it

> use new_database

switched to db new_database

> db.dropDatabase()

{ "dropped" : "new_database", "ok" : 1 }

Confidential

Basics: replicaSets

Confidential

Basics: replicaSet

Confidential

Basics: replicaSet

☐ Replication is transported through the “oplog”☐ The oplog is a special collection in a replicaSet

☐ All transactions are stored in the oplog (except for local db)☐ Oplog resides inside the local database☐ Limited in size (on disk)☐ Sliding window of transactions☐ Purging of transactions happens via FIFO

☐ Oplog durability is one of the most important metrics☐ Select first and last transaction from the oplog: durability in sec.

☐ All nodes send/receive heartbeats

Confidential

Basics: initial sync

☐ Adding a new secondary:☐ Node gets added to the cluster

☐ Cluster will check how advanced the new secondary is (last executed transaction)

☐ If secondary is too far behind an initial sync is executed☐ Copy document by document

☐ Kickstarting / seeding☐ Make a full (binary) copy of the primary to the secondary☐ Add to the cluster☐ Only last transactions from the oplog will be sent

Confidential

Exercise 2: run a replicaSet

☐ Kill previous daemonkillall mongod

☐ Create the data directories and run mongod:mkdir plam101-rs1 plam101-rs2 plam101-rs3mongod --dbpath plam101-rs1 --logpath mongo-101-rs1.log --port 27001 --replSet myrs

--forkmongod --dbpath plam101-rs2 --logpath mongo-101-rs2.log --port 27002 --replSet myrs

--forkmongod --dbpath plam101-rs3 --logpath mongo-101-rs3.log --port 27003 --replSet myrs

--fork

Confidential

Exercise 2: initiate the replicaSet

☐ Connect to the first nodeMongoconnecting to: 127.0.0.1:27001/test>

☐ Initiate the replicaSet:> rs.initiate(){ "info2" : "no configuration explicitly specified -- making one", "me" : "<yourhost>.local:27001", "ok" : 1}myrs:SECONDARY>myrs:PRIMARY> rs.status()

Confidential

Exercise 2: add new members

☐ Connect to the first node$ mongomyrs:PRIMARY>

☐ Add the other members:myrs:PRIMARY> rs.add("<yourhost>.local:27002"){ "ok" : 1 }myrs:PRIMARY> rs.add("<yourhost>.local:27003"){ "ok" : 1 }

Confidential

Exercise 2: Watch the oplog grow

☐ Run rs.printReplicationInfo()myrs:PRIMARY> rs.printReplicationInfo()

☐ Insert some big rows: (should take a few minutes)myrs:PRIMARY> var doc = {"foo": "bar"}

myrs:PRIMARY> for (i = 0; i < 10000; i++) { doc["foo"] += i; db.inserttest.insert(doc);

}

WriteResult({ "nInserted" : 1 })

☐ Run rs.printReplicationInfo() and spot the differencemyrs:PRIMARY> rs.printReplicationInfo()

Confidential

Exercise 2: Maybe adjust the oplog size?

myrs:PRIMARY> rs.printReplicationInfo()

configured oplog size: 192MB

log length start to end: 183secs (0.05hrs)

oplog first event time: Tue Sep 27 2016 16:19:36 GMT+0200 (CEST)

oplog last event time: Tue Sep 27 2016 16:22:39 GMT+0200 (CEST)

now: Tue Sep 27 2016 16:25:22 GMT+0200 (CEST)

Confidential

Demo: oplog too small, full sync necessary

☐ Stop one secondary☐ Insert some big rows: (should take a few minutes)myrs:PRIMARY> var doc = {"foo": "bar"}

myrs:PRIMARY> for (i = 0; i < 10000; i++) { doc["foo"] += i; db.inserttest.insert(doc);

}

WriteResult({ "nInserted" : 1 })

☐ Start secondary again and watch log file

Confidential

Basics: High Availability

Confidential

Basics: node failure

Confidential

Basics: node failure

☐ After a primary is lost (heartbeat timeout) no write operations can happen

☐ Read operations can still happen☐ Remaining nodes start electing a new primary

Confidential

Basics: election voting

Confidential

Basics: election voting

☐ All remaining nodes vote for a new primary☐ Priority: higher values make a node more eligible to become a

primary☐ Votes: allows a node to vote for a new primary☐ Only nodes with priority and voting power can vote☐ You can set the priority (numeric) per node

☐ You can set the voting (on and off) per node

☐ Up to 7 nodes can vote

Confidential

Basics: election voting

Confidential

Basics: node recovery

Confidential

Basics: durability

☐ Replication happens, like MySQL replication, asynchronously☐ Eventual consistency

☐ Writeconcern☐ Wait for confirmation from secondary nodes☐ numeric, majority or <tag>

☐ Wait for write to journal

Confidential

Basics: durability

Confidential

Basics: durability

Confidential

Basics: understanding eventual consistency

Confidential

Basics: read from secondary

Confidential

Basics: eventual consistency and secondary

Confidential

Basics: arbiter node

Confidential

Basics: arbiter node

☐ The arbiter node will not store any data☐ The arbiter node will confirm writes☐ The arbiter node will take part in voting for a new primary

Confidential

Demo: node recovery on a replicaSet

Confidential

Demo: node recovery on a replicaSet

☐ 3 node cluster (node 3 is arbiter)☐ Data gets inserted on Primary (node 1)☐ Secondary (node 2) fails☐ Some time passes☐ Primary (node 1) fails and node 2 comes back up☐ Node 2 becomes primary☐ Inserting data into new primary (node 2)☐ Node 1 comes up again and realizes it is no longer primary☐ Fetches oplog from the new primary which is more advanced☐ Node 1 performs a rollback

Confidential

Basics: Sharding

Confidential

Basics: sharded cluster

Confidential

Basics: sharded cluster

Confidential

Demo: sharding

Confidential

Running commands in the mongo shell

☐ MongoDB built in commands☐ db.help() is your friend

☐ Collection built in commands☐ db.<collection>.help()

☐ For replicaSets and shards the helpers start with rs and sh☐ rs.help()☐ sh.help()

Confidential

Scripting in the mongo shell

☐ Mongo shell runs Javascript☐ Create variables, functions, etc☐ You can iterate over cursors

cursor = db.collection.find();

while ( cursor.hasNext() ) {

printjson( cursor.next() );

}

cursor = db.collection.find();

while ( cursor.hasNext() ) {

doc = cursor.next(); doc["newfield"] = "something"; db.collection.save(doc);

}

Confidential

Recap

☐ MongoDB basics☐ MongoDB terminology☐ replicaSet and replication☐ Durability and eventual consistency☐ Sharding

☐ Running MongoDB☐ Single server☐ ReplicaSet

Confidential

Basics: CRUD

Confidential

Create: inserting data

☐ As you have seen, inserting data is very simple:db.<collection>.insert(document, { [writeConcern], [ordered] })db.<collection>.insertOne(document, { [writeConcern] }) (3.2)db.<collection>.insertMany(document, { [writeConcern], [ordered] }) (3.2)

☐ Comparable to SQLINSERT INTO <collection> VALUES (document);

☐ Ordered defaults to true

Confidential

Create: inserting data

☐ Every document must have an identifier (_id)☐ If no identifier (_id) has been provided, one will be generated

☐ For example:> db.somecollection.insert({"foo": "bar"})

WriteResult({ "nInserted" : 1 })

> db.somecollection.insert({"_id": "1234","foo": "bar"})

WriteResult({ "nInserted" : 1 })

> db.somecollection.insert({"_id": "foobar","foo": "bar"})

WriteResult({ "nInserted" : 1 })

Confidential

Create: inserting multiple documents

> db.somecollection.insert( [ {"foo": "bar"}, {"_id": "1234","foo": "bar"}, {"_id":

"foobar","foo": "bar"} ] )

BulkWriteResult({

"writeErrors" : [ ],

"writeConcernErrors" : [ ],

"nInserted" : 3,

"nUpserted" : 0,

"nMatched" : 0,

"nModified" : 0,

"nRemoved" : 0,

"upserted" : [ ]

})

Confidential

Insert: insert durability (writeConcern)

☐ Similar to the insert method:db.<collection>.insert(document, { writeconcern: {w: <value>, j: <boolean>, wtimeout: <number>}})

☐ W option☐ Wait for confirmation from other nodes☐ Number, “majority” or <tag>

☐ J option☐ Setting to true will wait for the journal write

☐ Wtimeout option☐ Timeout for the writeconcern

Confidential

Insert: wait for journal write

Confidential

Insert: wait for other node to write

Confidential

Insert: insert durability

☐ Examples:> db.somecollection.insert({"foo": "bar"}, {writeconcern: { w: 1, j: false}} )

WriteResult({ "nInserted" : 1 })

> db.somecollection.insert({"foo": "bar"}, {writeconcern: { w: majority, j: true}} )

WriteResult({ "nInserted" : 1 })

> db.somecollection.insert({"foo": "bar"}, {writeconcern: { w: 2, j: true, wtimeout:

100}} )

WriteResult({ "nInserted" : 1 })

Confidential

Create: inserting multiple documents

☐ As you have seen, inserting data is very simple:db.<collection>.insert(document, [writeConcern], [ordered])

☐ Comparable to SQLINSERT INTO <collection> VALUES (document);

☐ Ordered defaults to true.☐ writeConcern will be explained in the afternoon session

Confidential

Create: exercise

1. Create a new collection named mytest in the test database by inserting the following JSON document:

{ "_id": 1, "name": "mytest1" }

2. Insert a second document in the mytest collection:{ "_id": 2, "name": "mytest2", "testdata": "test1234" }

3. Insert a couple of documents in the mytest collection:[ { "name": "mytest3", "testdata": "test1234" },{ "name": "mytest4", "testdata": "test1234" }{ "name": "mytest5", "testdata": "test1234" } ]

Confidential

Create: exercise answer

> use test

> db.mytest.insert({ "_id": 1, "name": "mytest1" })

WriteResult({ "nInserted" : 1 })

> db.mytest.insert({ "_id": 2, "name": "mytest2", "testdata": "test1234" })

WriteResult({ "nInserted" : 1 })

> db.mytest.insert( [ { "name": "mytest3", "testdata": "test1234" }, { "name":

"mytest4", "testdata": "test1234" }, { "name": "mytest5", "testdata": "test1234" } ] )

BulkWriteResult({

"nInserted" : 3,

...

})

Confidential

Read: finding your data

☐ The find command will retrieve your data as a cursordb.<collection>.find(query, projection)db.<collection>.findOne(query, projection)var cursor = db.<collection>.find(query, projection)

☐ Query: selection filter using query operators☐ Projection: fields to return from the document☐ SQL equivalent:SELECT projection FROM collection WHERE query

Confidential

Read: finding your data

☐ Example:> db.somecollection.find({"_id": "1234"}, {"_id": 1, "foo": 1})

{ "_id" : "1234", "foo" : "bar" }

> var cursor = db.somecollection.find({"_id": "1234"}, {"_id": 1, "foo": 1})

> while (cursor.hasNext()) { printjson(cursor.next()); }

{

"_id" : "1234",

"foo" : "bar"

}

Confidential

Read: query operators

☐ Basic query operators are $eq, $gt, $gte, $lt, $lte, $ne, $in, $nin

☐ Logical query operators are $and, $or, $not, $nor☐ Element (array) operators are $exists, $type☐ Other noteworthy operators are $regex, $text, $geoWithin

See also: https://docs.mongodb.com/manual/reference/operator/

Confidential

Read: finding your data

☐ Example: all documents with sale value less than 100> db.somecollection.find({"sale_value": {"$lt":"100"} })

{ "_id" : "1002", "sale_value" : 75 }

{ "_id" : "1004", "sale_value" : 52 }

{ "_id" : "1008", "sale_value" : 95 }

Confidential

Read: sort and limit

☐ You can sort a result by appending the query with the sort function:

db.somecollection.find().sort({"_id": 1})

db.somecollection.find().sort({"_id": -1})

db.somecollection.find().sort({"_id": 1, "foo": -1})

See also: https://docs.mongodb.com/manual/reference/method/cursor.sort/

☐ Limiting a result is done similarly:db.somecollection.find().limit(10)

See also: https://docs.mongodb.com/manual/reference/method/cursor.limit/

Confidential

Read: exercise

☐ Prior to this exercise, import the zipcodes data set:$ mongoimport -d test -c zipcodes zips.json

1. Find the first document2. Find the last document3. Find the zipcodes with a population greater than 100,000

Confidential

Read: exercise answers

1. Find the first document> db.zipcodes.findOne()

> db.zipcodes.find().limit(1)

{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338,

"state" : "MA" }

2. Find the last document> db.zipcodes.find().sort({"_id": -1}).limit(1)

{ "_id" : "99950", "city" : "KETCHIKAN", "loc" : [ -133.18479, 55.942471 ], "pop" : 422,

"state" : "AK" }

3. Find the zipcodes with a population greater than 100,000> db.zipcodes.find( { "pop": { "$gt": 100000}} )

Confidential

Update: update method

☐ The update command enables you to update one or many rows

db.<collection>.update(query, update, options)

☐ Query: selection filter using query operators (same as find)☐ Update: modification to apply☐ Options: upsert, multi, and writeConcern☐ SQL equivalent:UPDATE <collection> SET update WHERE query

Confidential

Update: update operators

☐ Most important update operators:☐ $set and $unset will update/remove the specified field(s)☐ $inc and $mul will operate on the value of the field☐ $rename will rename a field

See also: https://docs.mongodb.com/manual/reference/operator/update/

Confidential

Update: the update options

☐ Upsert☐ if document exists: update or else insert a new document

☐ Multi☐ By default only one document gets updated☐ Setting multi to true will update multiple documents at once

New in 3.2:db.<collection>.updateOne(query, update, options)db.<collection>.updateMany(query, update, options)db.<collection>.replaceOne(query, update, options)

Confidential

Update: update durability

☐ Similar to the insert method:db.<collection>.update(query, update, { w: <value>, j: <boolean>, wtimeout: <number>})

☐ W option☐ Number, “majority” or <tag>

☐ J option☐ Setting to true will wait for the journal write

☐ Wtimeout option☐ Timeout for the writeconcern

Confidential

Update: example update

☐ Example:db.somecollection.update( {"_id": ObjectId("57e171765ffbf76ca639bd65")}, { $set: { "foo": "barbar", "array.1": "four" }, $inc: {"counter": 2} })

Confidential

Update: example document update

☐ Example:db.somecollection.update( {"_id": ObjectId("57e171765ffbf76ca639bd65")}, { "replace": "all", "contents": ["with","this","new","document"] })

Confidential

Update: save method

☐ The save method performs either an insert or update command

db.<collection>.save(document, writeConcern)

☐ If no _id field has been provided an insert will be performed.☐ The _id field will be filled with an ObjectID

☐ If an _id field has been provided an update will happen☐ Update will be performed with upsert enabled

Confidential

Update: example save

☐ Example:db.somecollection.save({"foo": "bar"})db.somecollection.save({"_id", "1234","foo": "bar"})

Confidential

Update: example save complex

☐ Example:> var doc=db.somecollection.findOne()> doc{ "_id" : ObjectId("57e16f925ffbf76ca639bd64"), "foo" : "bar" }> doc["counter"]=00> db.somecollection.save(doc)WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })> db.somecollection.findOne()

{

"_id" : ObjectId("57e16f925ffbf76ca639bd64"),

"foo" : "bar",

"counter" : 0

}

Confidential

Update: exercise

1. Increase the population of zipcode 90210 (Beverly Hills) by 12. Iterate over the zipcodes collection and add a new field called

“votes” with a value of 0

Confidential

Update: exercise answers

1. Increase the population of zipcode 90210 (Beverly Hills) by 1> db.zipcodes.update( { "_id": "90210" }, { "$inc": {"pop": 1}})WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

2. Iterate over the zipcodes collection and add a new field called “votes” with a value of 0

> var cur = db.zipcodes.find()> while (cur.hasNext()) { var zip = cur.next(); zip["votes"] = 0; db.zipcodes.save(zip); }WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 0 })

Confidential

Delete: removing documents

☐ To remove documents you can use the following methods:db.<collection>.remove(query, options) db.<collection>.deleteOne(filter, options) //new in 3.2db.<collection>.deleteMany(filter, options) // new in 3.2

☐ Query and filter are basically the same☐ Pass empty document ( {} ) to match all documents

☐ Options are writeConcern and justOne (only for remove method)

Confidential

Recap

☐ Creating databases / collections☐ Dropping databases / collections☐ CRUD

☐ Creating data☐ Reading data☐ Updating data☐ Deleting data

Confidential

Break: 10:30 - 11:00

Next hour: basic management and design patterns

Confidential

Aggregations

Confidential

CRUD bonus: aggregate

☐ The aggregate method fills the gaps for the find method☐ Creates aggregates of matching documents☐ Aggregate allows multiple pipelines☐ Most important pipelines:

☐ $unwind: Unwinds arrays (e.g. like a JOIN)☐ $group: Groups documents together (like GROUP BY)

☐ $match: Only match certain documents. When used in a second pipeline it acts as a HAVING

☐ $sort: Sorts the result set (like ORDER BY)

Confidential

Aggregate: group accumulators

☐ Most important accumulators:☐ $sum☐ $avg☐ $min☐ $max☐ $push / $addToSet

Confidential

Aggregate: match / group example

> db.somecollection.find()

{ "_id" : ObjectId("57e16f925ffbf76ca639bd64"), "foo" : "bar",

"counter" : 1 }

{ "_id" : "1234", "foo" : "bar" }

{ "_id" : "2345", "foo" : "barbar" }

{ "_id" : "foobar", "foo" : "bar" }

> db.somecollection.aggregate([ { $match: {"foo": "bar"}}, { $group: { "_id": "$foo", "count": {$sum: 1}}}

]){ "_id" : "bar", "count" : 3 }

Confidential

Aggregate: unwind / group example (1)

> db.someothercol.find()

{ "_id" : 1, "name" : "blogpost 1", "tags" : [ "music", "literature" ] }

{ "_id" : 2, "name" : "blogpost 2", "tags" : [ "dogs", "cats", "kittens" ] }

{ "_id" : 3, "name" : "blogpost 3", "tags" : [ "memes" ] }

{ "_id" : 4, "name" : "blogpost 4", "tags" : [ "memes", "kittens" ] }

Confidential

Aggregate: unwind / group example (2)

> db.someothercol.aggregate([{$unwind: "$tags"}])

{ "_id" : 1, "name" : "blogpost 1", "tags" : "music" }

{ "_id" : 1, "name" : "blogpost 1", "tags" : "literature" }

{ "_id" : 2, "name" : "blogpost 2", "tags" : "dogs" }

{ "_id" : 2, "name" : "blogpost 2", "tags" : "cats" }

{ "_id" : 2, "name" : "blogpost 2", "tags" : "kittens" }

{ "_id" : 3, "name" : "blogpost 3", "tags" : "memes" }

{ "_id" : 4, "name" : "blogpost 4", "tags" : "memes" }

{ "_id" : 4, "name" : "blogpost 4", "tags" : "kittens"

Confidential

Aggregate: unwind / group example (3)

db.someothercol.aggregate([

{$unwind: "$tags"},

{$group: {"_id": "$tags", "count": {$sum: 1}}},

{$sort: {"count": 1}}

])

{ "_id" : "cats", "count" : 1 }

{ "_id" : "dogs", "count" : 1 }

{ "_id" : "literature", "count" : 1 }

{ "_id" : "music", "count" : 1 }

{ "_id" : "memes", "count" : 2 }

{ "_id" : "kittens", "count" : 2 }

Confidential

Aggregate: exercise

1. From the zipcodes collection, calculate the total population per city and sort by total population descending

2. From the zipcodes collection, calculate the average population per zipcode of New York (hint: the _id field is the zipcode)

Confidential

Aggregate: exercise

☐ From the zipcodes collection, calculate the total population per city and sort by total population descending

db.zipcodes.aggregate([ { $group: { "_id": "$city", "total_pop": {$sum: "$pop"}} }, { $sort: { "total_pop": -1} }])

Confidential

Aggregate: exercise

☐ From the zipcodes collection, calculate the average population per zipcode of New York

db.zipcodes.aggregate([ { $match: { "city": "NEW YORK"} }, { $group:{ "_id": "$_id", "average_pop": {$avg: "$pop"}} }])

Confidential

Basic Management

Confidential

Exporting data

☐ Exporting data can be done via mongoexport☐ Format limitations

☐ JSON or CSV☐ BSON rich documents are not supported☐ binData☐ objectId☐ Date

☐ Some tricks are applied when using mongoimport

☐ In general: not a reliable way to make backups!

Confidential

Exporting data to JSON

☐ Example:Mongoexport -d test -c mytest -o mytest.json

☐ Contents should be similar to this:{"_id":1.0,"name":"mytest1"}

{"_id":2.0,"name":"mytest2","testdata":"test1234"}

{"_id":{"$oid":"57ea8420506638730683f57c"},"name":"mytest3","testdata":"test1234"}

{"_id":{"$oid":"57ea8420506638730683f57d"},"name":"mytest4","testdata":"test1234"}

{"_id":{"$oid":"57ea8420506638730683f57e"},"name":"mytest5","testdata":"test1234"}

Confidential

Importing data

☐ Importing data can be done via mongoimport☐ Counterpart of mongoexport

☐ Format limitations☐ JSON or CSV

☐ Example:mongoimport -d test -c mytest mytest.json

Confidential

Types of backups

☐ Logical backups☐ Dump of your data

☐ Physical backups☐ File(system) copy of your data

Confidential

Logical backups

☐ Mongodump☐ MongoDB Backup☐ Mongob

Confidential

Logical backups: mongodump

☐ Mongodump☐ BSON dump of the data☐ BSON files per database / collection☐ Archive

☐ OEM tool☐ Works great but needs some wrapping

Confidential

Logical backups: MongoDB Backup

☐ MongoDB Backup☐ https://www.npmjs.com/package/mongodb-backup☐ Nodejs backup solution☐ CLI and API☐ Can stream backups

Confidential

Logical backups: Mongob

☐ Mongob☐ https://github.com/cmpitg/mongob☐ Python based CLI tool☐ MongoDB instance or bz2 target☐ Can copy data between collections☐ Incremental backups☐ Rate limiting

Confidential

Physical backups: Filesystem snapshots

☐ Filesystem snapshots☐ LVM☐ ZFS☐ XFS (xfs_freeze)☐ EBS

Confidential

Physical backups: Strata

☐ MongoRocks Strata☐ https://github.com/facebookgo/rocks-strata☐ Backs up on file level☐ Supports incremental backups☐ Queryable backups

Confidential

Restore

☐ To restore from a mongodump

Confidential

Exercise: backup using mongodump

1. Create a backup using mongodump2. Log into MongoDB and drop a collection3. Restore the collection using the dump created earlier

Confidential

Exercise: backup using mongodump

1. Create a backup using mongodump$ mongodump --gzip --archive=dump.gz

2. Log into MongoDB and drop a collection> db.inserttest.drop()

3. Restore the collection using the dump created earlier$ mongorestore --port 27003 -d test -c inserttest --gzip --archive=dump.gz

2016-09-27T19:33:19.574+0200 creating intents for archive

2016-09-27T19:33:19.704+0200 reading metadata for test.inserttest from archive

'dump.gz'

2016-09-27T19:33:19.731+0200 restoring test.inserttest from archive 'dump.gz'

2016-09-27T19:33:40.624+0200 restoring indexes for collection test.inserttest from

metadata

2016-09-27T19:33:40.635+0200 finished restoring test.inserttest (146411 documents)

2016-09-27T19:33:40.635+0200 done

Confidential

Schema design patterns

Confidential

Normalized data

{

_id: "@percona",

name: "Percona Twitter account"

}

{

twitter_id: "@percona",

joined: ISODate("2009-04-02"),

location: "Raleigh, NC 27617"

}

Confidential

Embedded document (One-to-one)

{

_id: "@percona",

name: "Percona Twitter account"

info: {

joined: ISODate("2009-04-02"),

location: "Raleigh, NC 27617"

}

}

Confidential

Embedded document (1 on many)

{

_id: "@percona",

name: "Percona Twitter account"

lasttweets: [{

tweet_id: 780892298024456193,

tweettime: ISODate("2016-09-27T15:10:01"),

tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open

source software roadmap and time for Q/A http://hubs.ly/H04wLsv0"

},{

tweet_id: 780874621386158080,

tweettime: ISODate("2016-09-27T13:59:23"),

tweet: "Problems solved, before they appear! Come to #PerconaLive to get hands on

training and more. http://hubs.ly/H04rr-J0"

}]

}

Confidential

How not to use embedded documents!

{

_id: 780892298024456193,

tweettime: ISODate("2016-09-27T15:10:01"),

tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open

source software roadmap and time for Q/A http://hubs.ly/H04wLsv0",

twitterhandle: {

name: "Percona Twitter account"

info: {

joined: ISODate("2009-04-02"),

location: "Raleigh, NC 27617",

}

}

}

Confidential

Document references (1 on many)

{

_id: "@percona",

name: "Percona Twitter account"

info: { joined: ISODate("2009-04-02"), location: "Raleigh, NC 27617" }

}

{

_id:780892298024456193,

tweettime: ISODate("2016-09-27T15:10:01"),

tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open

source software roadmap and time for Q/A http://hubs.ly/H04wLsv0",

twitterhandle: "@percona"

}

Confidential

Impact of various data models

☐ Document growth☐ Reallocation of the same document impacts performance☐ Writing to the same document often creates hotspots

Confidential

Impact of various data models

1 2 3 4 5

1 3 4 5

2

2

Confidential

Impact of various data models

☐ Atomicity☐ No single write operation can change more than one document☐ Writing to multiple documents is not atomic☐ Write all changes to a single document at the same time

Confidential

Impact of various data models

{

_id: "@percona",

name: "Percona Twitter account"

lasttweets: [{

tweet_id: 780892298024456193,

tweettime: ISODate("2016-09-27T15:10:01"),

tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open

source software roadmap and time for Q/A http://hubs.ly/H04wLsv0"

},{

tweet_id: 780874621386158080,

tweettime: ISODate("2016-09-27T13:59:23"),

tweet: "Problems solved, before they appear! Come to #PerconaLive to get hands on

training and more. http://hubs.ly/H04rr-J0"

}]

}

Confidential

Impact of various data models

☐ Sharding☐ Sharding documents requires a shard key

☐ Choosing the right shard key is the start of your document structure

☐ Choosing the wrong shard key may impact performance

Confidential

Impact of various data models

☐ Example: which field to use as a shard key?{

_id:780892298024456193,

tweettime: ISODate("2016-09-27T15:10:01"),

tweet: "Wed 11am PT @PeterZaitsev will go over highlights from the @Percona open

source software roadmap and time for Q/A http://hubs.ly/H04wLsv0",

twitterhandle: "@percona"

}

Confidential

Impact of various data models

☐ Indexes☐ Every index consumes disk space and memory☐ Each new index has a negative impact on write performance☐ High read-to-write ratio will benefit from indexes☐ High write-to-read ratio will benefit from having less indexes

Confidential

Impact of various data models

☐ Number of collections☐ Having many collections has no performance penalty

☐ Having many collections will improve performance (concurrency)

☐ MMAPv1 limited in number of namespaces

☐ Large number of (small) documents☐ Can give more random disk access

Confidential

Recap

☐ Aggregations☐ Why you need to know about them

☐ Basic management☐ Import/export data☐ Backup/restore

☐ Schema design patterns

Confidential

Exercise: setting up the env

☐ Setup the environment for this afternoon☐ Clone git repo☐ Create a cluster by running the following command:

./build_process.sh

☐ This should build a sharded cluster using Percona Server MongoDB

Confidential

Lunch: 12:00 - 13:30

See you at the afternoon session !

top related