mongodb: a gentle, friendly overview
DESCRIPTION
My talk @ CRS4 about a MongoDB overviewTRANSCRIPT
A gentle, friendly overview
Antonio Pintus
CRS4, 08/09/2011
1
NOSQL /1
• MongoDB belongs to the NoSQL databases family:
• non-relational
• document-oriented
• no prefixed, rigid, database schemas
• no joins
• horizontal scalability2
NOSQL /2
• NoSQL DB family includes several DB types:
• document/oriented: mongoDB, CouchDB, ...
• Key Value / Tuple Store: Redis, ...
• Graph databases: Neo4j, ...
• ...3
MongoDB
• Performant: C++
• Schema-free
• Full index support
• No transactions
• Scalable: replication + sharding
• document-based queries
• Map/Reduce
• GridFS
• a JavaScript interactive shell
4
SCHEMA-FREE• Schema-free collections = NO TABLES!
• A Mongo deployment (server) holds a set of databases
• A database holds a set of collections
• A collection holds a set of documents
• A document is a set of fields: key-value pair (BSON)
• A key is a name (string), a value is a basic type like string, integer, float, timestamp, binary, etc.,a document, or an array of values
5
DATA FORMAT
• document/oriented
• stores JSON-style documents: BSON (Binary JSON):
• JSON + other data types. E.g., Date type and a BinData type.
• Can reference other documents
• lightweight, traversable, efficient
6
BSON{! "_id" : ObjectId("4dcec9a0af391a0d53000003"),
! "servicetype" : "sensor",
! "description" : "it’s only rock’n’roll but I like it",
! "policy" : "PUBLIC",
! "owner" : "User001",
! "date_created" : "2011-05-02 17:11:28.874086",
! "shortname" : "SampleSensor",
! "content-type" : "text/plain",
! "icon" : "http://myserver.com/images/sens.png"
} 7
COLLECTIONS
• More or less, same concept as “table” but dynamic, schema-free
• collection of BSON documents
• documents can have heterogeneous data structure in the same collection
8
QUERIES• query by documents
• Examples (using the interactive shell):
• db.mycollection.find( {"policy" : "PUBLIC"} );
• db.mycollection.findOne({"policy" : "PUBLIC", “owner”:”User001”});
• db.mycollection.find({"policy" : "PUBLIC", “owner”:”User001”}).limit(2);
• db.mycollection.find( {"policy" : "PUBLIC"}, {“shortname”:1} );
• db.mycollection.find({"counter": {$gt:2}});
• conditional ops: <, <=, >, >=, $and, $in, $or, $nor, ...
9
INDEXES• Full index support: index on any attribute (including multiple)
• increase query performance
• indexes are implemented as “B-Tree” indexes
• data overhead for inserts and deletes, don’t abuse!
• db.mycollection.ensureIndex( {"servicetype" : 1} );
• db.mycollection.ensureIndex( {"servicetype" : 1, “owner”:-1} );
• db.mycollection.getIndexes()
• db.system.indexes.find()
10
INSERTS
• Simplicity
• db.mycollection.insert({“a”:”abc”,...})
• var doc = {“name”:”mongodb”,...};
• db.mycollection.insert(doc);
11
UPDATES1. replace entire document
2. atomic, in-place updates
• db.collection.update( criteria, objNew, upsert, multi )
• criteria: the query
• objNew: updated object or $ operators (e.g., $inc, $set) which manipulate the object
• upsert: if the record(s) do not exist, insert one.
• multi: if all documents matching criteria should be updated
• db.collection.save(...): single object update with upsert
12
UPDATES /2
• atomic, in-place updates = highly efficient
• provides special operators
• db.mycollection.update( { “shortname”:"Arduino" }, { $inc: { n : 1 } } );
• db.mycollection.update( { “shortname”:"Arduino" }, { $set: { “shortname” : “OldArduino” } } );
• other atomic ops: $unset, $push, $pushAll, $addToSet, $pop, $pull, $rename, ...
13
Mongo DISTRIBUTION• Mac, Linux, Solaris, Win
• mongod: database server.
• By default, port=27017, store path=/data/db.
• Override with --dbpath, --port command options
• mongo: interactive JavaScript shell
• mongos: sharding controller server 14
MISCELLANEOUS: REST
• mongod provides a basic REST interface
• launch it with --rest option: default port=28017
• http://localhost:28017/mydb/mycollection/
• http://localhost:28017/mydb/mycollection/?filter_shortname=Arduino
• http://localhost:28017/mydb/mycollection/?filter_shortname=Arduino
• http://localhost:28017/mydb/mycollection/?filter_shortname=Arduino&limit=10
15
GOOD FOR
• event logging
• high performance small read/writes
• Web: real-time inserts, updates, and queries. Auto-sharding (scalability) and replication are provided.
• Real-time stats/analytics
16
LESS GOOD FOR
• Systems with heavy transactional nature
• Traditional Business Intelligence
• (obviously) System and problems requiring SQL
17
SHARDING /1
• Horizontal scalability: MongoDB auto-sharding
• partitioning by keys
• auto-balancing
• easy addition of new servers
• no single points-of-failure
• automatic failover/replica-sets18
SHARDING /2
19
mongod
mongod
mongod
Config servers
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod...
mongos mongos
Client
...
Shards
DRIVERS
• C# and .NET
• C, C++
• Erlang, Perl
• Haskell
• Java, Javascript
• PHP
• Python, Ruby, Delphi
• Scala
• Clojure
• Go, Objective C
• Smalltalk
• ...
20
PyMongo
• Recommended MongoDB driver for the Python language
• An easy way to install it (Mac, Linux):
• easy_install pymongo
• easy_install -U pymongo
21
QUICK-START: INSERT• (obviously) mongod must be running ;-)
22
import pymongofrom pymongo import Connection
conn = Connection() # default localhost:27017; conn=Connection('myhost',9999)
db = conn['test_db'] # gets the database
test_coll = db['testcoll'] # gets the desired collection
doc = {"name":"slides.txt", "author":"Antonio", "type":"text", "tags": ["mongodb", "python", "slides"]} # a dict
test_coll.insert(doc) # inserts document into the collection
• lazy creation: collections and databases are created when the first document is inserted into them
QUICK-START: QUERY
23
res = test_coll.find_one() # gets one document
query = {"author":"Antonio"} # a query document
res = test_coll.find_one(query) # searches for one document
for doc in test_coll.find(query): # using Cursors on multiple docs print doc ...
test_coll.count() # counts the docs in the collection
NOT COVERED (HERE)
• GridFS: binary data storage is limited to 16MB in DB, so GridFS transparently splits large files among multiple documents
• MapReduce: batch processing of data and aggregation operations
• GeoSpatial Indexing: two-dimensional indexing for location-based queries (e.g., retrieve the n closest restaurants to my location)
24
IN PRODUCTION (some...)
25
26
27
Paraimpu LOVES MongoDB
• MongoDB powers Paraimpu, our Social Web of Things tool
• great data heterogeneity
• real-time thousands, small data inserts/queries
• performances
• horizontal scalability
• easy of use, development is funny!
28
REFERENCES• http://www.mongodb.org/
• http://www.mongodb.org/display/DOCS/Manual
• http://www.mongodb.org/display/DOCS/Slides+and+Video
• pymongo: http://api.mongodb.org/python/
• Paraimpu: http://paraimpu.crs4.it