Download - When?, Why? and What? of MongoDB
Flavio [FlaPer87] Percoco [email protected]
twitter: @flaper87
What?Why?
When?
domingo 8 de mayo de 2011
When?domingo 8 de mayo de 2011
When?Dictionaries!
domingo 8 de mayo de 2011
When?Dictionaries!
Spidering!
domingo 8 de mayo de 2011
When?
Statistics!
Dictionaries!
Spidering!
domingo 8 de mayo de 2011
When?
Statistics!
Dictionaries!
Spidering!
Queues!
domingo 8 de mayo de 2011
When?
Logging!
Statistics!
Dictionaries!
Spidering!
Queues!
domingo 8 de mayo de 2011
Why?domingo 8 de mayo de 2011
Why?
* Unstructured Data! (Spidering)
domingo 8 de mayo de 2011
Why?
* Lot of reads! (Dictionaries, Queues)
* Unstructured Data! (Spidering)
domingo 8 de mayo de 2011
Why?
* Lot of reads! (Dictionaries, Queues)
* Unstructured Data! (Spidering)
* [JB]son like Document Oriented API (All)
domingo 8 de mayo de 2011
Why?
* Lot of writes! (Logging, Statistics, Queues)
* Lot of reads! (Dictionaries, Queues)
* Unstructured Data! (Spidering)
* [JB]son like Document Oriented API (All)
domingo 8 de mayo de 2011
What?
# lets get our collectioncollection = connection['dictionaries']['it']
* Make sure you create the right indexes
def insert_word(word, data): collection.update({'word' : word}, data, upsert=True)
domingo 8 de mayo de 2011
What?
# lets get our collectioncollection = connection['dictionaries']['it']
# lets ensure there’s an index for the key wordcollection.ensure_index([("word", pymongo. ASCENDING)])
* Make sure you create the right indexes
def insert_word(word, data): collection.update({'word' : word}, data, upsert=True)
domingo 8 de mayo de 2011
What?
def parse(response): url_netloc = urlparse.urlsplit(response.url).netloc crawled = { "url" : response.url, "base_url" : url_netloc, "content" : response.body_as_unicode(), "status" : response.status, "encoding" : response.encoding, "headers" : response.headers, "lastcrawl" : time.time(), } collection.update({'url' : response.url}, crawled, True)
* Make sure you save what you really need
domingo 8 de mayo de 2011
What?
* Make sure you understand that schemaless != mess
logs = [ {'url' : "http://www.google.com", "time" : 1304336526.011287}, {'address' : "http://www.yahoo.com", "time" : 1304336551.0424709 }
]
def insert_log() for log in logs: collection.insert(log)
domingo 8 de mayo de 2011
What?
logs = [ {'url' : "http://www.google.com", "time" : 1304336526.011287}, {'address' : "http://www.yahoo.com", "time" : 1304336551.0424709 } ]
def insert_log() for log in logs: log_to_insert = { "url" : log.get('url', log.get('address')), "time" : log.get('time') } collection.insert(log_to_insert)
* Make sure you understand that schemaless != mess
domingo 8 de mayo de 2011
What?
* “Relate” what you occasionally need, “Embed” what you always need
message = { 'msg' : "This is a test message", 'time' : time.time(), 'user' : { 'username' : 'flaper87', 'email' : '[email protected]', }}
domingo 8 de mayo de 2011
What?
* ObjectIDs have an embedded datetime
def _get(self, queue): try: msg = self.client.database.command("findandmodify",
"messages", query={"queue": queue}, sort={"_id": pymongo.ASCENDING}, remove=True) except errors.OperationFailure, exc: if "No matching object found" in exc.args[0]: raise Empty() raise return deserialize(msg["value"]["payload"])
domingo 8 de mayo de 2011
Lets talk about mongoDB!!
Thanks!!
domingo 8 de mayo de 2011
Thanks!!
Lets talk about mongoDB!! Thanks 10gen!!
domingo 8 de mayo de 2011