navigating the transition from relational to nosql technology
DESCRIPTION
While the hype surrounding NoSQL (non-relational) database technology has become deafening, there is real substance beneath the often exaggerated claims. NoSQL database technologies have emerged as a better match for the needs of modern interactive applications with cost-effective data management. Developers accustomed to relational database technology need to approach things differently. To view Couchbase webinars on-demand visit http://www.couchbase.com/webinarsTRANSCRIPT
1
Naviga&ng the Transi&on from Rela&onal to NoSQL Technology
Dip& Borkar Senior Product Manager
2
WHY TRANSITION TO NOSQL?
3
11%
12%
16%
29%
35%
49%
Other
All of these
Costs
High latency/low performance
Inability to scale out data
Lack of flexibility/rigid schemas
Source: Couchbase NoSQL Survey, December 2011, n=1351
What is the biggest data management problem driving your use of NoSQL in the coming year?
Survey: Two big drivers for NoSQL adop&on
4
Q
Q
Are you being impacted by these?
Schema Rigidity problems • Do you store serialized objects in the database? • Do you have lots of sparse tables with very few columns being used by most rows?
• Do you find that your applica&on developers require schema changes frequently due to constantly changing data?
• Are you using your database as a key-‐value store?
Scalability problems • Do you periodically need to upgrade systems to more powerful servers and scale up?
• Are you reaching the read / write throughput limit of a single database server?
• Is your server’s read / write latency not mee&ng your SLA? • Is your user base growing at a frightening pace?
5
DISTRIBUTED DOCUMENT DATABASES
6
Document Databases
• Each record in the database is a self-‐describing document
• Each document has an independent structure
• Documents can be complex • All databases require a unique key • Documents are stored using JSON or XML or their deriva&ves
• Content can be indexed and queried • Offer auto-‐sharding for scaling and replica&on for high-‐availability
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
7
Advantages of Document Databases
• Schema flexibility – Gives users applica&on flexibility for evolving system without restructuring exis&ng data
• Dynamic Elas&city – Moving data while maintaining consistency is easier
• Performance – Related data is stored in a single document. This allows consistently low-‐latency access to the data
• Query Flexibility – Indexing allows users to query contents of documents
8
Advantages of Document Databases
• Schema flexibility – Gives users applica&on flexibility for evolving system without restructuring exis&ng data
• Dynamic Elas&city – Moving data while maintaining consistency is easier
• Performance – Related data is stored in a single document. This allows consistently low-‐latency access to the data
• Query Flexibility – Indexing allows users to query contents of documents
1 Compare rela&onal and document DB data models
Compare rela&onal and document DB scaling models 2
9
COMPARING DATA MODELS
10
Rela&onal vs Document data model
R1C1
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
Rela&onal data model Document data model Highly-‐structured table organiza&on with rigidly-‐defined data formats and record
structure.
Collec&on of complex documents with arbitrary, nested data formats and
varying “record” format.
R1C2 R1C3 R1C4
R2C1 R2C2 R2C3 R2C4
R3C1 R3C2 R3C3 R3C4
R4C1 R4C2 R4C3 R4C4
11
Example: Error Logging Use case
KEY
Table 1: Error Log Table 2: Data Centers
ERR DC TIME KEY LOC
1 ERR FK(DC2)
TIME
2 ERR FK(DC2)
TIME
3 ERR FK(DC2)
TIME
4 ERR FK(DC3)
TIME
NUM
1
2
3
DEN
NYC
SFO
303-‐223-‐ 2332
212-‐223-‐ 2332
415-‐223-‐ 2332
12
{ “ID”: 4, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
Document design with flexible schema
{ “ID”: 3, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
{ “ID”: 2, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
13
{ “ID”: 4, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
Document design with flexible schema
{ “ID”: 5, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”,
“COMPONENT”: ”DMS” “SEV”: “LEVEL1”
“DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
SCHEMA CHANGE
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
14
When considering how to model data for a given applica&on • Think of a logical container for the data • Think of how data groups together
Document modeling
Q • Are these separate object in the model layer? • Are these objects accessed together? • Do you need updates to these objects to be atomic? • Are mul&ple people edi&ng these objects concurrently?
15
Document Design Op&ons
• One document that contains all related data – Data is de-‐normalized – Bemer performance and scale – Eliminate client-‐side joins
• Separate documents for different object types with cross references – Data duplica&on is reduced – Objects may not be co-‐located – Transac&ons supported only on a document boundary – Most document databases do not support joins
16
Document ID / Key selec&on
• Documents are sharded based on the document ID • ID based document lookup is extremely fast • Similar to primary keys in rela&onal databases • Usually an ID can only appear once in a bucket
17
Document ID / Key selec&on
• Similar to primary keys in rela&onal databases • Documents are sharded based on the document ID • ID based document lookup is extremely fast • Usually an ID can only appear once in a bucket
Op&ons • UUIDs, date-‐based IDs, numeric IDs • Hand-‐craoed (human readable) • Matching prefixes (for mul&ple related objects)
Q • Do you have a unique way of referencing objects? • Are related objects stored in separate documents?
18
Example: Data Profile for Users
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{! “_id”: “auser_profile”,! “user_id”: 7778! “password”: “a1004cdcaa3191b7”,! ”common_name”: ”Robert User”, ! ”nicknames”: [”Bob”, ”Buddy”],! "sign_up_timestamp": 1224612317,! "last_login_timestamp": 1245613101!}
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{! “_id”: “auser_friends”,! “friends”: [ “joe”, ! “alan”,! “toru” ]!}
19
• User profile The main pointer into the user data
• Blog entries • Badge serngs, like a twimer badge
• Blog posts Contains the blogs themselves
• Blog comments • Comments from other users
Example: En&&es for a Blog BLOG
20
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
Blog Document – Op&on 1 – Single document
{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[ ! [“format”: “markdown”, “body”:”Awesome post!”],! [“format”: “markdown”, “body”:”Like it.” ]! ]!}
21
Blog Document – Op&on 2 -‐ Split into mul&ple docs
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[!
! “comment1_jchris_Hello_world”!! ]!
}!{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{!“_id”: “comment1_jchris_Hello_World”,!“format”: “markdown”, !“body”:”Awesome post!” !}
BLOG DOC
COMMENT
22
• You can imagine how to take this to a threaded list
Threaded Comments
Blog First comment
Reply to comment
More Comments
List
List
Advantages • Only fetch the data when you need it
• For example, rendering part of a web page
• Spread the data and load across the en&re cluster
23
COMPARING SCALING MODEL
24
Modern interactive software architecture
Application Scales Out Just add more commodity web servers
Database Scales Up Get a bigger, more complex server
Note – Rela&onal database technology is great for what it is great for, but it is not great for this.
25
NoSQL database matches application logic tier architecture Data layer now scales with linear cost and constant performance.
Application Scales Out Just add more commodity web servers
Database Scales Out Just add more commodity data servers
Scaling out flattens the cost and performance curves.
NoSQL Database Servers
26
Other things to consider before transi&oning
Accessing data – Learn about the development API the database supports
– Check if the programing language of your choice is supported
Consistency – Understand the consistency model and check if it meets your needs
– Analyze your applica&on needs – do you need atomicity across mul&ple objects?
Availability – Ensure that there is no single point of failure
– Understand the replica&on behavior and availability on node failures
App Server
App Server
App Server
27
Other things to consider before transi&oning
Opera&ons – Monitoring the system – Backup and restore the system – Upgrades and maintenance – Support
Scaling – Ease of adding and reducing capacity – Applica&on availability on topology
changes
Maturity – Does your applica&on need rich database
func&onality? (mul&-‐doc transac&ons, complex security needs, complex joins)
App Server
App Server
Client
28
BRIEF OVERVIEW COUCHBASE SERVER
29
Couchbase automa&cally distributes data across commodity servers. Built-‐in caching enables apps to read and write data with sub-‐millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements.
Couchbase Server
Simple. Fast. Elas&c. NoSQL.
30
Typical Couchbase produc&on environment
Applica&on users
Load Balancer
Applica&on Servers
Servers
31
Reading and Wri&ng
Reading Data Wri&ng Data
Server
Give me document A
Here is document A
Application Server
A
Server
Please store document A
OK, I stored document A
Application Server
A
32
Reading and Wri&ng
Reading Data Wri&ng Data
Server
Give me document A
Here is document A
Application Server
A
Server
Please store document A
OK, I stored document A
Application Server
A
RAM
DISK
A
A
RAM
DISK
A
A
33
Server
Flow of data when wri&ng
Wri&ng Data
Application ServerApplication Server Application Server
Applica&ons wri&ng to Couchbase
Couchbase wri&ng to disk
network
Couchbase transmi`ng replicas
Replica&on queue Disk write queue