europython 2016 : a deep dive into the pymongo driver

45
A Deep Dive into the Pymongo Driver Joe Drumgoole Director of Developer Advocacy, EMEA 21-July-2016 V1.0

Upload: joe-drumgoole

Post on 06-Apr-2017

219 views

Category:

Software


0 download

TRANSCRIPT

Page 1: EuroPython 2016 : A Deep Dive into the Pymongo Driver

A Deep Dive into the Pymongo Driver

Joe DrumgooleDirector of Developer Advocacy, EMEA

21-July-2016

V1.0

Page 2: EuroPython 2016 : A Deep Dive into the Pymongo Driver

2

MongoDB

MongoDB Query Language (MQL) + Native Drivers

MongoDB Document/JSON Data Model

WiredTiger MMAP

Man

agem

ent

Sec

urity

In-memory Encrypted 3rd party

Shared Clusters

Replica Sets

Page 3: EuroPython 2016 : A Deep Dive into the Pymongo Driver

3

Drivers and Frameworks

Morphia

MEAN Stack

Page 4: EuroPython 2016 : A Deep Dive into the Pymongo Driver

4

BSON Side Bar

• MongoDB uses a binary format of JSON called BSON (Binary, jSON)• Adds type and size information• Allows efficient parsing and skipping • You can use MongoDB Drivers without every knowing that BSON exists• Open standard (http://bsonspec.org/, licensed under the Creative Commons)• There are BSON libraries in every driver if you fancy trying it out• Similar to google protocol buffers

Page 5: EuroPython 2016 : A Deep Dive into the Pymongo Driver

5

Single Server

Driver

Mongod

Page 6: EuroPython 2016 : A Deep Dive into the Pymongo Driver

6

Replica Set

Driver

Secondary Secondary

Primary

Page 7: EuroPython 2016 : A Deep Dive into the Pymongo Driver

7

Replica Set Primary Failure

Driver

Secondary Secondary

Page 8: EuroPython 2016 : A Deep Dive into the Pymongo Driver

8

Replica Set Election

Driver

Secondary Secondary

Page 9: EuroPython 2016 : A Deep Dive into the Pymongo Driver

9

Replica Set New Primary

Driver

Primary Secondary

Page 10: EuroPython 2016 : A Deep Dive into the Pymongo Driver

10

Replica Set Recovery

Driver

Primary Secondary

Secondary

Page 11: EuroPython 2016 : A Deep Dive into the Pymongo Driver

11

Sharded Cluster

Driver

Mongod Mongod

Mongod

Mongod Mongod

Mongod

Mongod Mongod

Mongod

mongos mongos

Page 12: EuroPython 2016 : A Deep Dive into the Pymongo Driver

12

Driver Responsibilities

https://github.com/mongodb/mongo-python-driver

Driver

Authentication& Security Python<->BSON Error handling &

Recovery

WireProtocol

Topology Management Connection Pool

Page 13: EuroPython 2016 : A Deep Dive into the Pymongo Driver

13

Driver Responsibilities

https://github.com/mongodb/mongo-python-driver

Driver

Authentication& Security Python<->BSON Error handling &

Recovery

WireProtocol

Topology Management Connection Pool

Page 14: EuroPython 2016 : A Deep Dive into the Pymongo Driver

14

Example API Callsimport pymongoclient = pymongo.MongoClient( host=“localhost”, port=27017)database = client[ ‘test_database’ ]collection = database[ ‘test_collection’ ]

collection.insert_one({ "hello" : "world" , "goodbye" : "world" } )

collection.find_one( { "hello" : "world" } )

collection.update({ "hello" : "world" }, { "$set" : { "buenos dias" : "world" }} )

collection.delete_one({ "hello" : "world" } )

Page 15: EuroPython 2016 : A Deep Dive into the Pymongo Driver

15

Start MongoClient

c = MongoClient( "host1, host2", replicaSet="replset" )

Page 16: EuroPython 2016 : A Deep Dive into the Pymongo Driver

16

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MongoClient( "host1, host2", replicaSet="replset" )

Page 17: EuroPython 2016 : A Deep Dive into the Pymongo Driver

17

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2

{ ismaster : False, secondary: True, hosts : [ host1, host2, host3 ] }

Page 18: EuroPython 2016 : A Deep Dive into the Pymongo Driver

18

What Does ismaster show?

>>> pprint.pprint( db.command( "ismaster" )){u'hosts': [u'JD10Gen-old.local:27017', u'JD10Gen-old.local:27018', u'JD10Gen-old.local:27019'], u'ismaster' : False, u'secondary': True, u'setName' : u'replset',…}>>>

Page 19: EuroPython 2016 : A Deep Dive into the Pymongo Driver

19

Topology

Current Topology ismaster New

Topology

Page 20: EuroPython 2016 : A Deep Dive into the Pymongo Driver

20

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

Page 21: EuroPython 2016 : A Deep Dive into the Pymongo Driver

21

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

Page 22: EuroPython 2016 : A Deep Dive into the Pymongo Driver

22

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 23: EuroPython 2016 : A Deep Dive into the Pymongo Driver

23

Next Is Insert

c = MongoClient( "host1, host2", replicaSet="replset" )client.db.col.insert_one( { "a" : "b" } )

Page 24: EuroPython 2016 : A Deep Dive into the Pymongo Driver

24

Insert Will Block

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 25: EuroPython 2016 : A Deep Dive into the Pymongo Driver

25

ismaster response from Host 1

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

ismaster

Page 26: EuroPython 2016 : A Deep Dive into the Pymongo Driver

26

Now Write Can Proceed

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Insert

Page 27: EuroPython 2016 : A Deep Dive into the Pymongo Driver

27

Later Host 3 Responds

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 28: EuroPython 2016 : A Deep Dive into the Pymongo Driver

28

Steady State

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 29: EuroPython 2016 : A Deep Dive into the Pymongo Driver

29

Life Intervenes

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 30: EuroPython 2016 : A Deep Dive into the Pymongo Driver

30

Monitor may not detect

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

ConnectionFailure

Page 31: EuroPython 2016 : A Deep Dive into the Pymongo Driver

31

So Retry

Secondaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 32: EuroPython 2016 : A Deep Dive into the Pymongo Driver

32

Check for Primary

Secondaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 33: EuroPython 2016 : A Deep Dive into the Pymongo Driver

33

Host 2 Is Primary

Primaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 34: EuroPython 2016 : A Deep Dive into the Pymongo Driver

34

Steady State

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 35: EuroPython 2016 : A Deep Dive into the Pymongo Driver

35

What Does This Mean? - Connect

import pymongo

client = pymongo.MongoClient()

try: client.admin.command( "ismaster" )except pymongo.errors.ConnectionFailure, e : print( "Cannot connect: %s" % e )

Page 36: EuroPython 2016 : A Deep Dive into the Pymongo Driver

36

What Does This Mean? - Queries

import pymongo

def find_with_recovery( collection, query ) : try:

return collection.find_one( query )

except pymongo.errors.ConnectionFailure, e :

logging.info( "Connection failure : %s" e ) return collection.find_one( query )

Page 37: EuroPython 2016 : A Deep Dive into the Pymongo Driver

37

What Does This Mean? - Inserts

def insert_with_recovery( collection, doc ) : doc[ "_id" ] = ObjectId() try: collection.insert_one( doc ) except pymongo.errors.ConnectionFailure, e: logging.info( "Connection error: %s" % e ) collection.insert_one( doc ) except DuplicateKeyError: pass

Page 38: EuroPython 2016 : A Deep Dive into the Pymongo Driver

38

What Does This Mean? - Updates

collection.update( { "_id" : 1 }, { "$inc" : { "counter" : 1 }})

Page 39: EuroPython 2016 : A Deep Dive into the Pymongo Driver

39

Configuration

connectTimeoutMS : 30sserverTimeoutMS : 30s

Page 40: EuroPython 2016 : A Deep Dive into the Pymongo Driver

40

connectTimeoutMS

Secondaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

connectTimeoutMS

serverTimeoutMS

Page 41: EuroPython 2016 : A Deep Dive into the Pymongo Driver

41

More Reading

• The spec author Jess Jiryu Davis has a collection of links and his better version of this talkhttps://emptysqua.re/blog/server-discovery-and-monitoring-in-mongodb-drivers/

• The full server discovery and monitoring spec is on GitHubhttps://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst

Page 42: EuroPython 2016 : A Deep Dive into the Pymongo Driver
Page 43: EuroPython 2016 : A Deep Dive into the Pymongo Driver

43

insert_one

• Stages– Parse the parameters– Get a socket to write data on– Add the object Id– Convert the whole insert command and parameters to a SON object– Apply the writeConcern to the command– Encode the message into a BSON object– Send the message to the server via the socket (TCP/IP)– Check for writeErrors (e.g. DuplicateKeyError)– Check for writeConcernErrors (e.g.writeTimeout)– Return Result object

Page 44: EuroPython 2016 : A Deep Dive into the Pymongo Driver

44

Bulk Insert

bulker = collection.initialize_ordered_bulk_op()bulker.insert( { "a" : "b" } )bulker.insert( { "c" : "d" } )bulker.insert( { "e" : "f" } )try: bulker.execute()except pymongo.errors.BulkWriteError as e : print( "Bulk write error : %s" % e.detail )

Page 45: EuroPython 2016 : A Deep Dive into the Pymongo Driver

45

Bulk Write

• Create Bulker object• Accumulate operations• Each operation is created as a SON object• The operations are accumulated in a list• Once execute is called

– For ordered execute in order added– For unordered execute INSERT, UPDATEs then DELETE

• Errors will abort the whole batch unless no write concern specified