some notes on nosql, in particular mongodb bettina berendt (with thanks to matthijs van leeuwen for...
DESCRIPTION
NoSQL Advantages – horizontally scalable (as opposed to vertically) – No static schema or data model – Cheaper in maintenance Disadvantages – Possibilities can be very system-specific no universal query language – Often, some coding is necessary – Fewer a/o weaker theoretical guarantees 3TRANSCRIPT
Some notes on NoSQL, in particular MongoDB
Bettina Berendt(with thanks to Matthijs van Leeuwen for some of the slides)
8 December 2015
Overview: NoSQL
‘Not only SQL’
No tables, but storage of, e.g.– Collections of documents (e.g. JSON, XML)– Key-value pairs– Columns of values– Graphs– Objects– …
2
NoSQL
Advantages– horizontally scalable (as opposed to vertically)– No static schema or data model– Cheaper in maintenance
Disadvantages– Possibilities can be very system-specific no universal query language– Often, some coding is necessary– Fewer a/o weaker theoretical guarantees
3
Example systems: NoSQL
4
MongoDB, the most popular system for document stores(see https://en.wikipedia.org/wiki/MongoDB and references there)
MongoDB is “schema-free“!
6
Understanding the MongoDB / NoSQL notion of “document“
– Good example of what computer scientists call “semi-structured data“ (see previous week)
– But actually fairly structured in comparison to e.g. a textual document:• MongoDB‘s format is called BSON, a binary form of JSON• See https://en.wikipedia.org/wiki/JSON, https://
en.wikipedia.org/wiki/BSON – Note: JSON can be thought of as an alternative to XML,
as described for example on the – certainly not disinterested - http://www.json.org/xml.html , but not the type of XML you often see for annotating texts, as for example in the Letters of 1916 project
7
INSERT a row (SQL) insert a document (MongoDB)db.inventory.insert(
{
item: "ABC1",
details: {model: "14Q3",
manufacturer: "XYZ Company"
},
stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],
category: "clothing"
}
)
8
SELECT (SQL) find documents (MongoDB)
db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } )
db.inventory.find( { type: 'food', price: { $lt: 9.95 } } )
9
SELECT and SORT
10
UPDATE (SQL) update documents (MongoDB)
db.inventory.update(
{ item: "MNO2" },
{
$set: {
category: "apparel",
details: { model: "14Q3", manufacturer: "XYZ" }
},
$currentDate: { lastModified: true }
}
)
11
Other useful constructs ...
... such as GROUP BY are also available (see Wikipedia description)
... And python interfaces exist.
Setting indexes in MongoDB:Usage (1): BSON structureGiven the following document in the users collection
{ “_id“ : ObjectID(...),“name“ : “Alice“,“age“ : 27“score“ : 25
}
the following command creates an index on the score field:
db.users.createIndex ( { “score“ : 1 } )
Usage (2)
14
SELECT and SORT (shown with reference to an index)
15
Importance for DHers?
– Certainly growing, but probably not necessary for everyone
My personal rule of thumb: – Very useful if
• you know the query you have (for example because you have worked it out on a small data sample, with SQL, python, or whatever), and
• you need to process LOTS of data– Less useful for very exploratory analysis, since
there you may need a universal query language.