navigating the transition from relational to nosql technology

1

Naviga&ng the Transi&on from Rela&onal to NoSQL Technology

Dip& Borkar Senior Product Manager

2

WHY TRANSITION TO NOSQL?

3

11%

12%

16%

29%

35%

49%

Other

All of these

Costs

High latency/low performance

Inability to scale out data

Lack of flexibility/rigid schemas

Source: Couchbase NoSQL Survey, December 2011, n=1351

What is the biggest data management problem driving your use of NoSQL in the coming year?

Survey: Two big drivers for NoSQL adop&on

4

Q

Q

Are you being impacted by these?

Schema Rigidity problems •  Do you store serialized objects in the database? •  Do you have lots of sparse tables with very few columns being used by most rows?

•  Do you find that your applica&on developers require schema changes frequently due to constantly changing data?

•  Are you using your database as a key-‐value store?

Scalability problems •  Do you periodically need to upgrade systems to more powerful servers and scale up?

•  Are you reaching the read / write throughput limit of a single database server?

•  Is your server’s read / write latency not mee&ng your SLA? •  Is your user base growing at a frightening pace?

5

DISTRIBUTED DOCUMENT DATABASES

6

Document Databases

•  Each record in the database is a self-‐describing document

•  Each document has an independent structure

•  Documents can be complex •  All databases require a unique key •  Documents are stored using JSON or XML or their deriva&ves

•  Content can be indexed and queried •  Offer auto-‐sharding for scaling and replica&on for high-‐availability

{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:

{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:

[“SERVER”, “US-‐West”, “API”]

}}

7

Advantages of Document Databases

•  Schema flexibility – Gives users applica&on flexibility for evolving system without restructuring exis&ng data

•  Dynamic Elas&city – Moving data while maintaining consistency is easier

•  Performance – Related data is stored in a single document. This allows consistently low-‐latency access to the data

•  Query Flexibility – Indexing allows users to query contents of documents

8

Advantages of Document Databases

•  Schema flexibility – Gives users applica&on flexibility for evolving system without restructuring exis&ng data

•  Dynamic Elas&city – Moving data while maintaining consistency is easier

•  Performance – Related data is stored in a single document. This allows consistently low-‐latency access to the data

•  Query Flexibility – Indexing allows users to query contents of documents

1 Compare rela&onal and document DB data models

Compare rela&onal and document DB scaling models 2

9

COMPARING DATA MODELS

10

Rela&onal vs Document data model

R1C1




}}




}}




}}




}}

Rela&onal data model Document data model Highly-‐structured table organiza&on with rigidly-‐defined data formats and record

structure.

Collec&on of complex documents with arbitrary, nested data formats and

varying “record” format.

R1C2 R1C3 R1C4

R2C1 R2C2 R2C3 R2C4

R3C1 R3C2 R3C3 R3C4

R4C1 R4C2 R4C3 R4C4

11

Example: Error Logging Use case

KEY

Table 1: Error Log Table 2: Data Centers

ERR DC TIME KEY LOC

1 ERR FK(DC2)

TIME

2 ERR FK(DC2)

TIME

3 ERR FK(DC2)

TIME

4 ERR FK(DC3)

TIME

NUM

1

2

3

DEN

NYC

SFO

303-‐223-‐ 2332

212-‐223-‐ 2332

415-‐223-‐ 2332

12

{ “ID”: 4, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }

Document design with flexible schema




13


Document design with flexible schema

{ “ID”: 5, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”,

“COMPONENT”: ”DMS” “SEV”: “LEVEL1”

“DC”: “NYC”, “NUM”: “212-‐223-‐2332” }

SCHEMA CHANGE




14

When considering how to model data for a given applica&on •  Think of a logical container for the data •  Think of how data groups together

Document modeling

Q •  Are these separate object in the model layer? •  Are these objects accessed together? •  Do you need updates to these objects to be atomic? •  Are mul&ple people edi&ng these objects concurrently?

15

Document Design Op&ons

•  One document that contains all related data – Data is de-‐normalized –  Bemer performance and scale –  Eliminate client-‐side joins

•  Separate documents for different object types with cross references – Data duplica&on is reduced – Objects may not be co-‐located –  Transac&ons supported only on a document boundary – Most document databases do not support joins

16

Document ID / Key selec&on

•  Documents are sharded based on the document ID •  ID based document lookup is extremely fast •  Similar to primary keys in rela&onal databases •  Usually an ID can only appear once in a bucket

17

Document ID / Key selec&on

•  Similar to primary keys in rela&onal databases •  Documents are sharded based on the document ID •  ID based document lookup is extremely fast •  Usually an ID can only appear once in a bucket

Op&ons • UUIDs, date-‐based IDs, numeric IDs • Hand-‐craoed (human readable) • Matching prefixes (for mul&ple related objects)

Q •  Do you have a unique way of referencing objects? •  Are related objects stored in separate documents?

18

Example: Data Profile for Users




}}

{! “_id”: “auser_profile”,! “user_id”: 7778! “password”: “a1004cdcaa3191b7”,! ”common_name”: ”Robert User”, ! ”nicknames”: [”Bob”, ”Buddy”],! "sign_up_timestamp": 1224612317,! "last_login_timestamp": 1245613101!}




}}

{! “_id”: “auser_friends”,! “friends”: [ “joe”, ! “alan”,! “toru” ]!}

19

•  User profile The main pointer into the user data

•  Blog entries •  Badge serngs, like a twimer badge

•  Blog posts Contains the blogs themselves

•  Blog comments •  Comments from other users

Example: En&&es for a Blog BLOG

20




}}

Blog Document – Op&on 1 – Single document

{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[ ! [“format”: “markdown”, “body”:”Awesome post!”],! [“format”: “markdown”, “body”:”Like it.” ]! ]!}

21

Blog Document – Op&on 2 -‐ Split into mul&ple docs




}}

{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[!

! “comment1_jchris_Hello_world”!! ]!

}!{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:



}}

{!“_id”: “comment1_jchris_Hello_World”,!“format”: “markdown”, !“body”:”Awesome post!” !}

BLOG DOC

COMMENT

22

•  You can imagine how to take this to a threaded list

Threaded Comments

Blog First comment

Reply to comment

More Comments

List

List

Advantages •  Only fetch the data when you need it

•  For example, rendering part of a web page

•  Spread the data and load across the en&re cluster

23

COMPARING SCALING MODEL

24

Modern interactive software architecture

Application Scales Out Just add more commodity web servers

Database Scales Up Get a bigger, more complex server

Note – Rela&onal database technology is great for what it is great for, but it is not great for this.

25

NoSQL database matches application logic tier architecture Data layer now scales with linear cost and constant performance.

Application Scales Out Just add more commodity web servers

Database Scales Out Just add more commodity data servers

Scaling out flattens the cost and performance curves.

NoSQL Database Servers

26

Other things to consider before transi&oning

Accessing data –  Learn about the development API the database supports

–  Check if the programing language of your choice is supported

Consistency –  Understand the consistency model and check if it meets your needs

–  Analyze your applica&on needs – do you need atomicity across mul&ple objects?

Availability –  Ensure that there is no single point of failure

–  Understand the replica&on behavior and availability on node failures

App Server

App Server

App Server

27

Other things to consider before transi&oning

Opera&ons –  Monitoring the system –  Backup and restore the system –  Upgrades and maintenance –  Support

Scaling –  Ease of adding and reducing capacity –  Applica&on availability on topology

changes

Maturity –  Does your applica&on need rich database

func&onality? (mul&-‐doc transac&ons, complex security needs, complex joins)

App Server

App Server

Client

28

BRIEF OVERVIEW COUCHBASE SERVER

29

Couchbase automa&cally distributes data across commodity servers. Built-‐in caching enables apps to read and write data with sub-‐millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements.

Couchbase Server

Simple. Fast. Elas&c. NoSQL.

30

Typical Couchbase produc&on environment

Applica&on users

Load Balancer

Applica&on Servers

Servers

31

Reading and Wri&ng

Reading Data Wri&ng Data

Server

Give me document A

Here is document A

Application Server

A

Server

Please store document A

OK, I stored document A

Application Server

A

32

Reading and Wri&ng

Reading Data Wri&ng Data

Server

Give me document A

Here is document A

Application Server

A

Server

Please store document A

OK, I stored document A

Application Server

A

RAM

DISK

A

A

RAM

DISK

A

A

33

Server

Flow of data when wri&ng

Wri&ng Data

Application ServerApplication Server Application Server

Applica&ons wri&ng to Couchbase

Couchbase wri&ng to disk

network

Couchbase transmi`ng replicas

Replica&on queue Disk write queue

34

THANK YOU

[email protected]

navigating the transition from relational to nosql technology

Technology

document db data models

single document

document design

data centers key

document modeling

time dc key loc num

single database server

nested data formats