some notes on nosql, in particular mongodb bettina berendt (with thanks to matthijs van leeuwen for...

15
Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

Upload: buddy-blair

Post on 06-Jan-2018

222 views

Category:

Documents


2 download

DESCRIPTION

NoSQL Advantages – horizontally scalable (as opposed to vertically) – No static schema or data model – Cheaper in maintenance Disadvantages – Possibilities can be very system-specific  no universal query language – Often, some coding is necessary – Fewer a/o weaker theoretical guarantees 3

TRANSCRIPT

Page 1: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

Some notes on NoSQL, in particular MongoDB

Bettina Berendt(with thanks to Matthijs van Leeuwen for some of the slides)

8 December 2015

Page 2: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

Overview: NoSQL

‘Not only SQL’

No tables, but storage of, e.g.– Collections of documents (e.g. JSON, XML)– Key-value pairs– Columns of values– Graphs– Objects– …

2

Page 3: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

NoSQL

Advantages– horizontally scalable (as opposed to vertically)– No static schema or data model– Cheaper in maintenance

Disadvantages– Possibilities can be very system-specific no universal query language– Often, some coding is necessary– Fewer a/o weaker theoretical guarantees

3

Page 4: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

Example systems: NoSQL

4

Page 5: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

MongoDB, the most popular system for document stores(see https://en.wikipedia.org/wiki/MongoDB and references there)

MongoDB is “schema-free“!

Page 6: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

6

Understanding the MongoDB / NoSQL notion of “document“

– Good example of what computer scientists call “semi-structured data“ (see previous week)

– But actually fairly structured in comparison to e.g. a textual document:• MongoDB‘s format is called BSON, a binary form of JSON• See https://en.wikipedia.org/wiki/JSON, https://

en.wikipedia.org/wiki/BSON – Note: JSON can be thought of as an alternative to XML,

as described for example on the – certainly not disinterested - http://www.json.org/xml.html , but not the type of XML you often see for annotating texts, as for example in the Letters of 1916 project

Page 7: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

7

INSERT a row (SQL) insert a document (MongoDB)db.inventory.insert(

{

item: "ABC1",

details: {model: "14Q3",

manufacturer: "XYZ Company"

},

stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],

category: "clothing"

}

)

Page 8: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

8

SELECT (SQL) find documents (MongoDB)

db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } )

db.inventory.find( { type: 'food', price: { $lt: 9.95 } } )

Page 9: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

9

SELECT and SORT

Page 10: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

10

UPDATE (SQL) update documents (MongoDB)

db.inventory.update(

{ item: "MNO2" },

{

$set: {

category: "apparel",

details: { model: "14Q3", manufacturer: "XYZ" }

},

$currentDate: { lastModified: true }

}

)

Page 11: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

11

Other useful constructs ...

... such as GROUP BY are also available (see Wikipedia description)

... And python interfaces exist.

Page 12: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

Setting indexes in MongoDB:Usage (1): BSON structureGiven the following document in the users collection

{ “_id“ : ObjectID(...),“name“ : “Alice“,“age“ : 27“score“ : 25

}

the following command creates an index on the score field:

db.users.createIndex ( { “score“ : 1 } )

Page 13: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

Usage (2)

Page 14: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

14

SELECT and SORT (shown with reference to an index)

Page 15: Some notes on NoSQL, in particular MongoDB Bettina Berendt (with thanks to Matthijs van Leeuwen for some of the slides) 8 December 2015

15

Importance for DHers?

– Certainly growing, but probably not necessary for everyone

My personal rule of thumb: – Very useful if

• you know the query you have (for example because you have worked it out on a small data sample, with SQL, python, or whatever), and

• you need to process LOTS of data– Less useful for very exploratory analysis, since

there you may need a universal query language.