mongodb user group israel may
DESCRIPTION
Protecting MongoDB with a RESTful APITRANSCRIPT
Protecting MongoDB With A RESTful API
Alon Horev
Israel MongoDB user group
May 2013
! Cellular networks are choking
! Automatic optimization to the rescue: 1. Collect analytics
2. Analyze and update network configuration
3. Back to 1!
! SON – self optimizing networks
! An example: a loaded cell
! We’re a proud Python shop
Agenda ! Why and how we migrated to MongoDB
! Do you need an API?
! What is a RESTful API?
! A review of Intucell’s API
! MongoDB best practices
Why MongoDB? ! Scale and failover just works!
! Data center partition tolerance
! Development speed ! Fast prototyping – schema changes frequently
! Slows down when in need for joins and transactions
Migration Challenges ! Migrating from MySQL to MongoDB
! People have direct access to the DB
! 20 developers
! 40 analysts and tech support
! “No joins? SQL? Transactions? GUI?”
! A lot to make up for!
Why An API? ! Complement mongo – reports (joins!) and PQL
! Hide implementation – data store(s), short names
! Security -‐ auth isn’t enough: {$where:'while(1){}’}
! Resource management – run slow queries on slaves
! Schema and referential integrity
Type Of API ! Small layer on top of your driver ! Dictionaries and hashes -‐ not OO! ! MongoEngine/MongoKit (ODM) ! Your own!
! RESTful ! Cross language ! Inherent to web apps ! Standards for caching, auth, throttling
RESTful ! “Representational state transfer”
! Not a standard but an architectural style
! Basically it’s a bunch of guidelines!
! Real world APIs break some of them
! HTTP as a communication layer
! Implementing CRUD using HTTP
RESTful Routes Resource Method and Route Meaning Users collection GET /users/ Read users
DELETE /users/ Delete users PUT /users/ Update users POST /users/ Create user/s
A user GET /users/<id> Read a user DELETE /users/<id> Delete a user PUT /users/<id> Update a user POST /users/<id> Create a user
* RESTful APIs usually don’t support batch operations of create/update/delete
HTTP Crash Course GET /search?q=foo&source=web HTTP/1.1Host: www.google.co.ilCache-Control: max-age=0User-Agent: Mozilla/5.0 Accept: text/html,application/xmlAccept-Encoding: gzip,deflate,sdchCookie: PREF=ID=9a768e836b317d:U=fd620232bd98bd* Note that I removed and shortened some headers
* query string parameters are limited to 2k! (browser specific)
HTTP Crash Course POST /api/v1/system/auth/users/alonho/ HTTP/1.1Host: localhostContent-Length: 20Content-Type: application/jsonUser-Agent: python-requests/0.9.3Cookie: token=6f01a9decd518f5cf5b4e14bddad{"password": "none"}* Note that I removed and shortened some headers
* Content (body) is allowed only in POST/PUT
CLI for HTTP ! A CLI can make your life easier
! Each API call is defined by: ! A resource ! A method ! Parameters
% son_cli –-create users name=‘alon’+--------------------------+------+| id | name |+==========================+======+| 5192605a9716ab5a94b37d3c | alon |+--------------------------+------+
Resource Generation ! We already use MongoEngine
! Declarative ! Enforces schema ! Supports inheritance (multiple types in one collection)
class User(Document): name = StringField(required=True) age = IntField(min_value=13, help_text=‘Years alive’, required=True)register_mongo_resource(User, ‘/users’)
Create % son_cli –c users age=3 {‘error’: ‘Bad Request’, ‘code’: 400, ‘message’: ‘Value 3 for field “age” is less than minimum value: 13’}% son_cli -c users name='alon' age=120+--------------------------+------+-----+| id | name | age |+==========================+======+=====+| 5192605a9716ab5a94b37d3c | alon | 120 |+--------------------------+------+-----+
Read % son_cli –r users+--------------------------+------+-----+| id | name | age |+==========================+======+=====+| 5192605a9716ab5a94b37d3c | alon | 120 |+--------------------------+------+-----+| 5192608d9716ab5a94b37d3d | john | 100 |+--------------------------+------+-----+| 519265909716ab5a94b37d3e | snow | 30 |+--------------------------+------+-----+
Sane defaults: by default read returns first 50 documents
Read Less % son_cli -r users page_size=2 page=0 fields=name,age+------+-----+| name | age |+======+=====+| alon | 120 |+------+-----+| john | 100 |+------+-----+
Read Ordered % son_cli -r users fields=name,age order=age+------+-----+| name | age |+======+=====+| snow | 30 |+------+-----+| john | 100 |+------+-----+| alon | 120 |+------+-----+ How would you order by ascending age and descending name: % son_cli -r users order=age,-name
Read Filtered % son_cli -r users query=‘age < 40 or name == “john”’+--------------------------+------+-----+| id | name | age |+==========================+======+=====+| 5192608d9716ab5a94b37d3d | john | 100 |+--------------------------+------+-----+| 519265909716ab5a94b37d3e | snow | 30 |+--------------------------+------+-----+
Update % son_cli -u users.5192605a9716ab5a94b37d3c name=anakin+--------------------------+--------+-----+| id | name | age |+==========================+========+=====+| 5192605a9716ab5a94b37d3c | anakin | 120 |+--------------------------+--------+-----+% son_cli –u users query=‘age >= 120’ age=100+-------+| count |+=======+| 1 |+-------+
Delete % son_cli -d users.5192605a9716ab5a94b37d3c+--------------------------+--------+-----+| id | name | age |+==========================+========+=====+| 5192605a9716ab5a94b37d3c | anakin | 120 |+--------------------------+--------+-----+% son_cli –d users query=‘age >= 120’+-------+| count |+=======+| 1 |+-------+
Aggregations API % son_cli -r users.view.count+-------+| count |+=======+| 4 |+-------+% son_cli -r users.view.count sum=age+-------+-----+| count | age |+=======+=====+| 4 | 321 |+-------+-----+
Aggregations API % son_cli -r users.view.count groupby=‘age > 60’+-------+----------+| count | age > 60 |+=======+==========+| 3 | True |+-------+----------+| 1 | False |+-------+----------+% son_cli -r users.view.count groupby='age > 60,age % 2’ sum=age+-------+---------+----------+-----+| count | age % 2 | age > 60 | age |+=======+=========+==========+=====+| 1 | 1 | True | 71 |+-------+---------+----------+-----+| 2 | 0 | True | 220 |+-------+---------+----------+-----+| 1 | 0 | False | 30 |+-------+---------+----------+-----+
Output Format % son_cli -r users.view.count groupby=‘age > 60’ format=csv"count","age > 60""3","True""1","False”% son_cli --json -r users.view.count fields='age > 60' [ { "count": 3, "age > 60": true }, { "count": 1, "age > 60": false }]
Schema % son_cli --json -r users.schema{ "type": "object", "properties": { "age": { "minimum": 13, "type": "integer", "description": "Years alive" }, "name": { "type": "string" }, "id": { "type": "string” } }}
This JSON describing JSON is called JSON Schema
Defying REST ! Collection level updates are rarely seen
! Performance – how long will it take?
! Query strings too long for GET (2k)
! Fall back to POST/PUT (lose caching)
! Extend OPTIONS for route completion
! OPTIONS returns supported methods
! Added an extension that returns routes
Route Discovery % curl -X OPTIONS http://localhost/api/v1/
{‘options’: [‘users/’, ‘posts/’]}
% curl –X OPTIONS http://localhost/api/v1/users/
{‘options’: [‘alon’, ‘john’]}
% curl http://localhost/api/v1/users/alon
{‘name’: ‘alon’, ‘twitter’: ‘alonhorev’}
* Available as an extension to flask called route-‐options
Documentation § Exposed through the API at /api/v1/docs
§ Displayed visually in the GUI
PQL
Querying Lets filter some users by names: Mongo: user_names = [‘foo’, ‘bar’] db.users.find({‘name’: {‘$in’: user_names}})SQL: name_list = ‘, ’.join(map(sql_escape, user_names)) sql = ‘select * from users where name in ({})’.format(name_list)* SQL users: do yourselves a favor and use an ORM.
Querying Lets find users older than 60 or younger than 20:
Mongo: db.users.find({‘$or’: [{‘age’: {‘$gt’: 60}}, {‘age’: {‘$lt’: 20}}])SQL: sql = ‘select * from users where age > 60 or age < 20’
PQL Mongo’s queries are easier to compose
SQL is easier to write when invoking ad-‐hoc queries
PQL was born – Mongo queries for humans!
>>> pql.find('age < 20 or age > 60’)
{'$or': [{'age': {'$lt': 20}},
{'age': {'$gt': 60}}]}
PQL – Schema! >>> pql.find('name == "foo"',
schema={'first_name': pql.StringField(),
'last_name': pql.StringField()})
Traceback (most recent call last):
...
ParseError: Field not found: name.
options: ['first_name', 'last_name']
PQL - Aggregations Car listing: {made_on: ISODate("1973-03-24T00:00:02.013Z”), price: 21000} Number of cars and total of prices per year in 1970-‐1990: > from pql import project, match, group> collection.aggregate( project(made_on='year(made_on)', price='price') | match('made_on >= 1970 and made_on <= 1990') | group(_id='made_on', count='sum(1)', total='sum(price)'))
PQL - Aggregations Compare to this: > collection.aggregate([ {'$project': {'made_on': {'$year': '$made_on'}, 'price': '$price'}}, {'$match': {'made_on': {'$gte': 1970, '$lte': 1990}}}, {'$group': {'_id': '$made_on', 'count': {'$sum': 1}, 'total’: {'$sum': '$price'}}}]) Write less characters: > project(price='base * tax + commision’)[{'$project': {'price’: {'$add': [{'$multiply': ['$base', '$tax']},'$commision']}}}]
BSON != JSON ! ObjectID and Date are BSON specific! ! Convert them to strings ! Using a codec is better – symmetrical!
>>> from bson import json_util
>>> json_util.dumps(datetime.datetime.now())
{"$date”: 1367970875910}
>>> json_util.dumps(bson.ObjectId())
{"$oid": "51896a43b46551eff3f43594"}
Python != JSON JSON Document Python Dictionary
Key type Only strings Anything immutable
Key order Ordered Unordered
Example: user id to name mapping Python: {1234: ‘Alon Horev’, 1038: ‘John Wayne’}Javascript: [{‘id’: 1234, ‘name’: ‘Alon Horev’}, {‘id’: 1038, ‘name’: ‘John Wayne’}]
Python != JSON db.users.ensureIndex({'friends.id': 1})
db.users.insert({friends: [{id: 123, name: ‘foo’}]})
db.users.find({'friends.id': 123}).explain()
{
"cursor": "BtreeCursor friends.id_1",
...
}
References http://python-‐eve.org/ -‐ A new RESTful API for MongoDB written in Python http://flask.pocoo.org/– A great python web framework https://github.com/alonho/pql -‐ The PQL query translator https://github.com/micha/resty -‐ resty enhances curl for RESTful API calls Learn from others! Twitter and Facebook have great RESTful APIs