europython 2016 : a deep dive into the pymongo driver
TRANSCRIPT
A Deep Dive into the Pymongo Driver
Joe DrumgooleDirector of Developer Advocacy, EMEA
21-July-2016
V1.0
2
MongoDB
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document/JSON Data Model
WiredTiger MMAP
Man
agem
ent
Sec
urity
In-memory Encrypted 3rd party
Shared Clusters
Replica Sets
3
Drivers and Frameworks
Morphia
MEAN Stack
4
BSON Side Bar
• MongoDB uses a binary format of JSON called BSON (Binary, jSON)• Adds type and size information• Allows efficient parsing and skipping • You can use MongoDB Drivers without every knowing that BSON exists• Open standard (http://bsonspec.org/, licensed under the Creative Commons)• There are BSON libraries in every driver if you fancy trying it out• Similar to google protocol buffers
5
Single Server
Driver
Mongod
6
Replica Set
Driver
Secondary Secondary
Primary
7
Replica Set Primary Failure
Driver
Secondary Secondary
8
Replica Set Election
Driver
Secondary Secondary
9
Replica Set New Primary
Driver
Primary Secondary
10
Replica Set Recovery
Driver
Primary Secondary
Secondary
11
Sharded Cluster
Driver
Mongod Mongod
Mongod
Mongod Mongod
Mongod
Mongod Mongod
Mongod
mongos mongos
12
Driver Responsibilities
https://github.com/mongodb/mongo-python-driver
Driver
Authentication& Security Python<->BSON Error handling &
Recovery
WireProtocol
Topology Management Connection Pool
13
Driver Responsibilities
https://github.com/mongodb/mongo-python-driver
Driver
Authentication& Security Python<->BSON Error handling &
Recovery
WireProtocol
Topology Management Connection Pool
14
Example API Callsimport pymongoclient = pymongo.MongoClient( host=“localhost”, port=27017)database = client[ ‘test_database’ ]collection = database[ ‘test_collection’ ]
collection.insert_one({ "hello" : "world" , "goodbye" : "world" } )
collection.find_one( { "hello" : "world" } )
collection.update({ "hello" : "world" }, { "$set" : { "buenos dias" : "world" }} )
collection.delete_one({ "hello" : "world" } )
15
Start MongoClient
c = MongoClient( "host1, host2", replicaSet="replset" )
16
Client Side View
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MongoClient( "host1, host2", replicaSet="replset" )
17
Client Side View
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2
{ ismaster : False, secondary: True, hosts : [ host1, host2, host3 ] }
18
What Does ismaster show?
>>> pprint.pprint( db.command( "ismaster" )){u'hosts': [u'JD10Gen-old.local:27017', u'JD10Gen-old.local:27018', u'JD10Gen-old.local:27019'], u'ismaster' : False, u'secondary': True, u'setName' : u'replset',…}>>>
19
Topology
Current Topology ismaster New
Topology
20
Client Side View
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
21
Client Side View
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
22
Client Side View
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
23
Next Is Insert
c = MongoClient( "host1, host2", replicaSet="replset" )client.db.col.insert_one( { "a" : "b" } )
24
Insert Will Block
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
Insert
25
ismaster response from Host 1
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
Insert
ismaster
26
Now Write Can Proceed
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
Insert
✔
Insert
27
Later Host 3 Responds
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✔
28
Steady State
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✔
29
Life Intervenes
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✖
30
Monitor may not detect
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✖
Insert
ConnectionFailure
31
So Retry
Secondaryhost2
Secondaryhost3
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✖
Insert
32
Check for Primary
Secondaryhost2
Secondaryhost3
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✖
Insert
33
Host 2 Is Primary
Primaryhost2
Secondaryhost3
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✖
Insert
34
Steady State
Secondaryhost2
Secondaryhost3
Primaryhost1
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✔
35
What Does This Mean? - Connect
import pymongo
client = pymongo.MongoClient()
try: client.admin.command( "ismaster" )except pymongo.errors.ConnectionFailure, e : print( "Cannot connect: %s" % e )
36
What Does This Mean? - Queries
import pymongo
def find_with_recovery( collection, query ) : try:
return collection.find_one( query )
except pymongo.errors.ConnectionFailure, e :
logging.info( "Connection failure : %s" e ) return collection.find_one( query )
37
What Does This Mean? - Inserts
def insert_with_recovery( collection, doc ) : doc[ "_id" ] = ObjectId() try: collection.insert_one( doc ) except pymongo.errors.ConnectionFailure, e: logging.info( "Connection error: %s" % e ) collection.insert_one( doc ) except DuplicateKeyError: pass
38
What Does This Mean? - Updates
collection.update( { "_id" : 1 }, { "$inc" : { "counter" : 1 }})
39
Configuration
connectTimeoutMS : 30sserverTimeoutMS : 30s
40
connectTimeoutMS
Secondaryhost2
Secondaryhost3
MongoClient
MonitorThread 1
MonitorThread 2 ✔
MonitorThread 3
YourCode
✔
✖
Insert
connectTimeoutMS
serverTimeoutMS
41
More Reading
• The spec author Jess Jiryu Davis has a collection of links and his better version of this talkhttps://emptysqua.re/blog/server-discovery-and-monitoring-in-mongodb-drivers/
• The full server discovery and monitoring spec is on GitHubhttps://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst
43
insert_one
• Stages– Parse the parameters– Get a socket to write data on– Add the object Id– Convert the whole insert command and parameters to a SON object– Apply the writeConcern to the command– Encode the message into a BSON object– Send the message to the server via the socket (TCP/IP)– Check for writeErrors (e.g. DuplicateKeyError)– Check for writeConcernErrors (e.g.writeTimeout)– Return Result object
44
Bulk Insert
bulker = collection.initialize_ordered_bulk_op()bulker.insert( { "a" : "b" } )bulker.insert( { "c" : "d" } )bulker.insert( { "e" : "f" } )try: bulker.execute()except pymongo.errors.BulkWriteError as e : print( "Bulk write error : %s" % e.detail )
45
Bulk Write
• Create Bulker object• Accumulate operations• Each operation is created as a SON object• The operations are accumulated in a list• Once execute is called
– For ordered execute in order added– For unordered execute INSERT, UPDATEs then DELETE
• Errors will abort the whole batch unless no write concern specified