cluster of unreliable commodity hardware (couchdb) (2)

27
Presented by Namitha Acharya CLUSTER OF UNRELIABLE COMMODITY HARDWARE (COUCHDB)

Upload: namitha-acharya

Post on 11-Apr-2017

296 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Cluster of unreliable commodity hardware (couchdb) (2)

Presented by Namitha Acharya

CLUSTER OF UNRELIABLE COMMODITY HARDWARE

(COUCHDB)

Page 2: Cluster of unreliable commodity hardware (couchdb) (2)

1

•What is CouchDB?

2

•ACID SEMANTICS

3

•FUTON

4

•How can I see my data

5

•REST API

6

•Views

7

•MapReduce Dialog

8

•MapReduce in CouchDB

9

•Reduce/ReReduce

10

•Restrictions on MapReduce

11

•Conflict Management

12

•Database Replication

13

•Security

14

•Enterprise using CouchDB

15

•SYNTAX

16

•CouchDB vs MongoDB

Page 3: Cluster of unreliable commodity hardware (couchdb) (2)

WHAT IS COUCHDB

◎ CouchDB was first released in 2005.

◎ It is an open source database, developed by Damien Katz ,former Lotus Notes developer at IBM.

◎ Damien Katz defined it as a "storage system for a large scale object database“

◎ He self-funded the project for almost two years and released it as an open source project under the GNU General Public License.

Page 4: Cluster of unreliable commodity hardware (couchdb) (2)

◎ It focuses on ease of use and completing embracing the web.

◎ It is a No SQL database.

◎ A document database server, accessible via a RESTful JSON API.

◎ Ad-hoc and schema-free with a flat address space.

Page 5: Cluster of unreliable commodity hardware (couchdb) (2)

◎ Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.

◎ Recently merged with Membase.

◎ It uses javascript as a query language using MapReduce.

Page 6: Cluster of unreliable commodity hardware (couchdb) (2)

◎ It uses HTTP protocol for an API.

◎ The distinguishing feature is that provides multi-master replication.

◎ Later it became an apache project in 2008.Unlike relational database it does not store data and relationships in table.

◎ Instead each database is a collection of independent document.

Page 7: Cluster of unreliable commodity hardware (couchdb) (2)

◎ Each document maintains its own data and self contained schema.

◎ An application may access multiple databases. For eg:one stored on user’s mobile phone and on server.

◎ Document metadata contains revision information(making it possible to merge differences occurred while databases are disconnected).

Page 8: Cluster of unreliable commodity hardware (couchdb) (2)

ACID SEMANTICS

◎ CouchDB implements MVCC(Multi-Version Concurrency Control) which the need to lock during writes.

◎ CouchDB reads operation where each client sees a consistent snapshot of the database from the beginning to the end of the read operation.

◎ CouchDB can handle a high volume of concurrent readers and writers without conflict.

Page 9: Cluster of unreliable commodity hardware (couchdb) (2)

FUTON

◎ Administration is supported by a built-in web application called FUTON.

◎ First version released in 2005.

◎ CouchDB is written in Erlang programming language, a cross-platform S/W available on various OS.

◎ CouchDB belongs to document oriented DB category, available under apache license(couchdb.apache.org)

Page 10: Cluster of unreliable commodity hardware (couchdb) (2)

HOW CAN I SEE MY DATA?

◎ CouchDB design documents can contain a “views” section

◎ Views contain Map/Reduce functions

◎ Map/Reduce functions are implemented in javascript

Page 11: Cluster of unreliable commodity hardware (couchdb) (2)

REST API

◎ All items have a unique URI that gets exposed via HTTP.

◎ REST uses the HTTP methods POST, GET, PUT and DELETE for the four basic CRUD (Create, Read, Update, Delete) operations on all resources.

Page 12: Cluster of unreliable commodity hardware (couchdb) (2)

VIEWS

◎ Filtering the documents in your database to find those relevant to a particular process.

◎ Building efficient indexes to find documents by any value or structure that resides in them

◎ Extracting data from your documents and presenting it in a specific order.

◎ Use these indexes to represent relationships among documents.

◎ CouchDB can index views and keep those indexes updated as documents are added, removed, or updated.

Page 13: Cluster of unreliable commodity hardware (couchdb) (2)

MAP/REDUCE DIALOG

◎ Bob: So, how do I query the database?

◎ IT guy: It’s not a database. It’s a key-value store.

◎ Bob: OK, it’s not a database. How do I query it?

◎ IT guy: You write a distributed map-reduce function in Erlang.

◎ Bob: Did you just tell me to go screw myself?

◎ IT guy: I believe I did, Bob.

Page 14: Cluster of unreliable commodity hardware (couchdb) (2)

MAP/REDUCE IN COUCHDB

◎ Map functions have a single parameter a document, and emit a list of key/value pairs of JSON values◉ CouchDB allows arbitrary JSON structures to be used

as keys

◎ Map is called for every document in the database◉ Efficiency?

◎ emit() function can be called multiple times in the map function

◎ View results are stored in B-Trees

Page 15: Cluster of unreliable commodity hardware (couchdb) (2)

REDUCE/REREDUCE

◎ The reduce function is optional..

◎ used to produce aggregate results for that view

◎ Reduce functions must accept, as input, results emitted by its corresponding map function as well as results returned by the reduce function itself(rereduce).

◎ On rereduce the key = null

◎ On a large database objects to be reduced will be sent to your reduce function in batches. These batches will be broken up on B-tree boundaries, which may occur in arbitrary places.

Page 16: Cluster of unreliable commodity hardware (couchdb) (2)

RESTRICTIONS ON MAP/REDUCE

◎ Map functions must be referentially transparent. Given the same doc will always issue the same key/value pairs

◉ Allows for incremental update

◎ Reduce functions must be able reduce on its own output

◉ This requirement of reduce functions allows CouchDB to store off intermediated reductions directly into inner nodes of btree indexes, and the view index updates and retrievals will have logarithmic cost

Page 17: Cluster of unreliable commodity hardware (couchdb) (2)

CONFLICT MANAGEMENT

◎ Conflicts are left to the application to resolve.

1. Involves merging data into one of the documents

2. Deleting the stale one.

◎ Multi-Version Concurrency Control (MVCC)

◎ CouchDB does not attempt to merge the conflicting revisions this is an application

◎ If there is a conflict in revisions between nodes

◉ App is ultimately responsible for resolving the conflict

◉ All revisions are saved

◉ One revision is selected as the most recent

Page 18: Cluster of unreliable commodity hardware (couchdb) (2)

DATABASE REPLICATION

◎ “CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication.”

◎ Replication is a unidirectional process.

◎ Databases in CouchDB have a sequence number that gets incremented every time the database is changed.

Page 19: Cluster of unreliable commodity hardware (couchdb) (2)

SECURITY

◎ Authorizations

◉ Reader - read/write document

◉ Database Admin - compact, add/edit views

◉ Server Admin - create and remove databases

Page 20: Cluster of unreliable commodity hardware (couchdb) (2)

◎ Eventual Consistency

◉ CouchDB guarantees eventual consistency to be able to provide both availability and partition tolerance.

◎ Built for Offline

◉ CouchDB can replicate to devices (like smartphones) that can go offline and handle data sync for you when the device is back online.

Page 21: Cluster of unreliable commodity hardware (couchdb) (2)

ENTERPRISES THAT USED OR ARE USING

COUCHDB ARE◎ Ubuntu began using it in 2009 for its

synchronization service "Ubuntu One“.

◎ The BBC, for its dynamic content platforms.

◎ Credit Suisse, for internal use at commodities department for their marketplace framework.

◎ Meebo, for their social platform (web and applications) - Meebo was acquired by Google and was shut down on July 12, 2012.

◎ Sophos, for some of their back-end systems.

Page 22: Cluster of unreliable commodity hardware (couchdb) (2)

ACCESSING DATA VIA HTTP

◎ Applications interact with CouchDB via HTTP.

◎ The following demonstrates a few examples using cURL, a command-line utility.

◎ These examples assume that CouchDB is running on localhost (127.0.0.1) on port 5984.

Action Request Response

Page 23: Cluster of unreliable commodity hardware (couchdb) (2)

CHECK SERVER

◎ curl http://127.0.0.1:5984/

◎ {

◎ "couchdb": "Welcome",

◎ "uuid": "85fb71bf700c17267fef77535820e371",

◎ "vendor": {

◎ "name": "The Apache Software Foundation",

◎ "version": "1.5.0"

◎ },

◎ "version": "1.5.0"

◎ }

Page 24: Cluster of unreliable commodity hardware (couchdb) (2)

CREATING A DATABASE

◎ curl -X PUT http://127.0.0.1:5984/albums

◎ { "ok": true }8

Page 25: Cluster of unreliable commodity hardware (couchdb) (2)

INSERTING A DOCUMENT

◎ curl -X PUT http://127.0.0.1:5984/albums/<uuid>

◎ -d '{“title”:“Hello”, “artist”:“World”}'

◎ { "ok": true,

◎ "id": “<uuid>",

◎ "rev": "1-2902191555“

◎ }

Page 26: Cluster of unreliable commodity hardware (couchdb) (2)

COUCHDB V.S. MONGODB

◎ Erlang v.s. C++

◎ JSON v.s. BSON

◎ HTTP v.s. Custom Protocol over TCP/IP

◎ Documents v.s. Collections/Documents

◎ Ranged Query v.s. Object-based Query

◎ MR -> View v.s. MR -> Collection

◎ MVCC v.s. Update in Place

◎ Master-Master v.s. Master-Slave

Page 27: Cluster of unreliable commodity hardware (couchdb) (2)

Thank You