Download - When?, Why? and What? of MongoDB

Flavio [FlaPer87] Percoco [email protected]

twitter: @flaper87

What?Why?

When?

domingo 8 de mayo de 2011

mailto:[email protected]

mailto:[email protected]

When?domingo 8 de mayo de 2011

When?Dictionaries!


When?Dictionaries!

Spidering!


When?

Statistics!

Dictionaries!

Spidering!


When?

Statistics!

Dictionaries!

Spidering!

Queues!


When?

Logging!

Statistics!

Dictionaries!

Spidering!

Queues!


Why?domingo 8 de mayo de 2011

Why?

* Unstructured Data! (Spidering)


Why?

* Lot of reads! (Dictionaries, Queues)



Why?



* [JB]son like Document Oriented API (All)


Why?

* Lot of writes! (Logging, Statistics, Queues)



* [JB]son like Document Oriented API (All)


What?

# lets get our collectioncollection = connection['dictionaries']['it']

* Make sure you create the right indexes

def insert_word(word, data): collection.update({'word' : word}, data, upsert=True)


What?

# lets get our collectioncollection = connection['dictionaries']['it']

# lets ensure there’s an index for the key wordcollection.ensure_index([("word", pymongo. ASCENDING)])

* Make sure you create the right indexes

def insert_word(word, data): collection.update({'word' : word}, data, upsert=True)


What?

def parse(response): url_netloc = urlparse.urlsplit(response.url).netloc crawled = { "url" : response.url, "base_url" : url_netloc, "content" : response.body_as_unicode(), "status" : response.status, "encoding" : response.encoding, "headers" : response.headers, "lastcrawl" : time.time(), } collection.update({'url' : response.url}, crawled, True)

* Make sure you save what you really need


What?

* Make sure you understand that schemaless != mess

logs = [ {'url' : "http://www.google.com", "time" : 1304336526.011287}, {'address' : "http://www.yahoo.com", "time" : 1304336551.0424709 }

]

def insert_log() for log in logs: collection.insert(log)


What?

logs = [ {'url' : "http://www.google.com", "time" : 1304336526.011287}, {'address' : "http://www.yahoo.com", "time" : 1304336551.0424709 } ]

def insert_log() for log in logs: log_to_insert = { "url" : log.get('url', log.get('address')), "time" : log.get('time') } collection.insert(log_to_insert)

* Make sure you understand that schemaless != mess


http://www.google.com

http://www.google.com

http://www.yahoo.com

http://www.yahoo.com

What?

* “Relate” what you occasionally need, “Embed” what you always need

message = { 'msg' : "This is a test message", 'time' : time.time(), 'user' : { 'username' : 'flaper87', 'email' : '[email protected]', }}


What?

* ObjectIDs have an embedded datetime

def _get(self, queue): try: msg = self.client.database.command("findandmodify",

"messages", query={"queue": queue}, sort={"_id": pymongo.ASCENDING}, remove=True) except errors.OperationFailure, exc: if "No matching object found" in exc.args[0]: raise Empty() raise return deserialize(msg["value"]["payload"])


Lets talk about mongoDB!!

Thanks!!


Thanks!!

Lets talk about mongoDB!! Thanks 10gen!!


Download - When?, Why? and What? of MongoDB

Top Related