the rise of nosql

74
Rise Arnd Kleinbeck of NoSQL The Rise Arnd Kleinbeck - September 2013 1

Upload: arnd-kleinbeck

Post on 26-Jan-2015

151 views

Category:

Technology


3 download

DESCRIPTION

After a brief introduction into the history of Database Management Systems different types of NoSQL data stores are characterized. Theoretical background information about sharding mechanisms, horizontal scaling and the CAP theorem are getting explained. After a comparison of different NoSQL stores you will get to know the pros and cons of the different approaches and you will learn how to take the decision for the best fitting database in your project.

TRANSCRIPT

Page 1: The Rise of NoSQL

Rise

Arnd Kleinbeck

ofNoSQLTheRise

Arnd Kleinbeck - September 2013

1

Page 2: The Rise of NoSQL

2

Page 3: The Rise of NoSQL

History

3

Page 4: The Rise of NoSQL

1980

1990

20002010

Rise of RDBMS

4

Page 5: The Rise of NoSQL

RDBMS

Persistence

Integration

SQLACID

Transactions

Tooling

5

Page 6: The Rise of NoSQL

Order: 4711Customer: Max

Payment: Credit CardLine items:

405 235 001540 987 326

6

Page 7: The Rise of NoSQL

7

Page 8: The Rise of NoSQL

Impedance

Mismatch

8

Page 9: The Rise of NoSQL

1980

1990

20002010

Rise of RDBMS

Rise ofOODBMS

9

Page 10: The Rise of NoSQL

1980

1990

20002010

Rise of RDBMS

Rise ofOODBMS

RDBMSDominance

10

Page 11: The Rise of NoSQL

A New Era

11

Page 12: The Rise of NoSQL

600.000.000 tweets per day

12

Page 13: The Rise of NoSQL

1.100.000.000 active users per month

13

Page 14: The Rise of NoSQL

Not only size matters...

Data Volumes grow exponentially

Data gets more connected

Semi-Structured/ Unstructured Data

14

Page 15: The Rise of NoSQL

Lots of Traffic

15

Page 16: The Rise of NoSQL

16

Page 17: The Rise of NoSQL

17

Page 18: The Rise of NoSQL

SCALING

UP SCALING

OUT

18

Page 19: The Rise of NoSQL

BigTable

Dynamo

19

Page 20: The Rise of NoSQL

1980

1990

20002010

Rise of RDBMS

Rise ofOODBMS

RDBMSDominance

Rise ofNoSQL

20

Page 21: The Rise of NoSQL

Definition

21

Page 22: The Rise of NoSQL

„Not only SQL“22

Page 23: The Rise of NoSQL

Characteristics

non relational

schemalessopen source

cluster friendly

21st CenturyWeb

no joins

23

Page 24: The Rise of NoSQL

Differences

data model

APIs

consistency

datadistribution

persistence

24

Page 25: The Rise of NoSQL

Data Models

25

Page 26: The Rise of NoSQL

26

Page 27: The Rise of NoSQL

Document

ColumnFamily

Graph Key-Value

27

Page 28: The Rise of NoSQL

Key Value

28

Page 29: The Rise of NoSQL

Key-Value

153245

153246

153247

. . .

. . .

29

Page 31: The Rise of NoSQL

Key Value Store Characteristics

Most simple data model

DB does not care about data types

Similar to persistent hash map

Fast lookups

Easy to distribute

Inspired by Amazon Dynamo paper

Restricted possibilities of querying

31

Page 32: The Rise of NoSQL

Open Source Advanced Key Value Store

In-Memory Store with optional durability

Knows types like strings, hashes, lists, sets

BSD License

Implemented in C

Very small footprint (20k LOC for rel. 2.2)

APIs for C/C++, C#, Closure, Lisp, Erlang, Go, Haskell, Java, JavaScript, Objective-C, Perl, PHP, Python, Ruby, ...

Used at Twitter, Instagram, flickr, stackoverflow, ...

32

Page 33: The Rise of NoSQL

Open Source Key Value Store

Highly available and fault-tolerant

Basho Technologies

Apache License

Implemented in Erlang

APIs for Java, Erlang, Ruby, Php, Python, Closure, C#, C/C++, HTTP, Node.js, Perl, Scala, Smalltalk, ...

Used at Mozilla, Comcast, AOL

33

Page 34: The Rise of NoSQL

Open Source Key Value Store

Big, distributed, persistent, fault-tolerant hash table

Developed by LinkedIn

Implemented in Java

Apache 2.0 License

Dynamo Scale Out

Used at LinkedIn

34

Page 35: The Rise of NoSQL

Document

35

Page 36: The Rise of NoSQL

{            "id":  "

993174208"

           "tex":  "texture  wood

 pile"

           "in_reply_to_screen_

name":  "akleinbe",  

           "in_reply_to_status_

id_str":  null,  

           "id_str":  "546918022

83900928",  

           "entities":  {

                       "user_mentions

":  [

                                   {

                                               "i

ndices":  [

                                                   

       3,  

                                                   

       19

                                               ],

 

                                               "s

creen_name":  "PostGradProb

lem",  

                                               "i

d_str":  "271572434",  

                                               "n

ame":  "PostGradProblems",  

                                               "i

d":  271572434

                                   }

                       ],  

                       "urls":  [  ],  

                       "hashtags":  [  

]

           }

}  

{            "id":  "596229751"            "customer_id":  "RT  @PostGradProblem:  In  preparation  for  the  NFL  

lockout,  I  will  be  spending  twice  as  much  time  analyzing  my  fantasy  

baseball  team  during  ...",              "truncated":  true,              "in_reply_to_user_id":  null,  

           "in_reply_to_status_id":  null,  

           "favorited":  false,              "source":  "<a  href=\"http://twitter.com/\"  rel=\"nofollow\">Twitter  

for  iPhone</a>",              "in_reply_to_screen_name":  null,  

           "in_reply_to_status_id_str":  null,  

           "id_str":  "54691802283900928",  

           "entities":  {                        "user_mentions":  [

                                   {                                                "indices":  [

                                                           3,  

                                                           19

                                               ],                                                  "screen_name":  "PostGradProblem",  

                                               "id_str":  "271572434",  

                                               "name":  "PostGradProblems",  

                                               "id":  271572434

                                   }                        ],                          "urls":  [  ],                          "hashtags":  [  ]            }}  

{    "id

":  "345209

4105"

   "user":  

{

         "noti

fications"

:  null,  

         "prof

ile_use_ba

ckground_i

mage":  tru

e,  

         "stat

uses_count

":  31,  

         "prof

ile_backgr

ound_color

":  "C0DEED

",  

         "foll

owers_coun

t":  3066,  

         "prof

ile_image_

url":  "htt

p://a2.twi

mg.com/pro

file_image

s/12857702

64/

PGP_normal

.jpg",  

         "list

ed_count":

 6,  

         "prof

ile_backgr

ound_image

_url":  "ht

tp://a3.tw

img.com/a/

1301071706

/

images/the

mes/theme1

/bg.png",  

         "desc

ription":  

"",  

         "scre

en_name":  

"PostGradP

roblem",  

         "defa

ult_profil

e":  true,  

         "veri

fied":  fal

se,  

         "time

_zone":  nu

ll,  

         "prof

ile_text_c

olor":  "33

3333",  

         "is_t

ranslator"

:  false,  

         "prof

ile_sideba

r_fill_col

or":  "DDEE

F6",  

         "loca

tion":  ""  

}

}

Document

36

Page 37: The Rise of NoSQL

Document Store Characteristics

You can query into document structure

You can use natural aggregates as documents

You can retrieve portions of a document

You can update portions of a document

You can have links between documents

Compared to key value data model the document is more transparent

No schema / implicit schema

Some queries are a pain in the neck!

37

Page 38: The Rise of NoSQL

Open Source Document Store

„Most popular NoSQL database“

Stores JSON like documents

Implemented in C++

GNU AGPL License

APIs for C/C++, C#, Go, Erlang, Java, JavaScript, Node.js, Perl, PHP, Python, Ruby, Scala, HTTP/REST

Used at Craigslist, eBay, Foursquare, SourceForge, NYT, ...

38

Page 39: The Rise of NoSQL

Open Source Document Store

Ease of Use

No update locks

Stores JSON like documents

Implemented in Erlang

Apache License

APIs for JavaScript, MapReduce, HTTP/REST

Used at BBC, Credit Suisse, Meebo, ...

39

Page 40: The Rise of NoSQL

Open Source Distributed Document Store

Optimized for interactive applications

Merged from Membase and CouchDB

Implemented in C++, Erlang, C

Apache License / Proprietary

APIs for Java, .NET, PHP, Ruby, Python, C

Used at AOL, Cisco, LinkedIn, Salesforce.com, Zynga, ...

40

Page 41: The Rise of NoSQL

Schemaless

41

Page 42: The Rise of NoSQL

Schemaless

Schemaless is one of the main reasons of interest in NoSQL databases

Schemaless reduces ceremony

Schemaless increases flexibility

BUT...

42

Page 43: The Rise of NoSQL

Schemaless means implicit schema

To query specific attributes you have to know their names

Schema Managment is shifted from db to code

http://martinfowler.com/articles/schemaless/

43

Page 44: The Rise of NoSQL

Column Family

44

Page 45: The Rise of NoSQL

Column-Family

http://www.oredev.org/videos/nosql--the-new-generation-of-agile-databases

45

Page 46: The Rise of NoSQL

more complicated data model

rich structure

single key (row key)

easy/ fast access to columns/column families in a row

rows can contain 100s or 1000s of columns

aggregate oriented

Column Family Characteristics

46

Page 47: The Rise of NoSQL

Open Source Wide Column Store

Supports multi data center replication

Good for distributed DBs with massive write loads

Implemented in Java

Apache License 2.0

APIs for C#, C++, Clojure, Erlang, Go, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala

Used at CERN, Facebook, Netflix, Rackspace, SoundCloud, Twitter ...

47

Page 48: The Rise of NoSQL

Open Source Column Oriented Database

Part of Hadoop, Inspired by Googles BigTable

Implemented in Java

Apache License 2.0

APIs for Restful HTTP, Thrift, C/C++, C#, Groovy, Java, PHP, Python, Scala

Used at Amazon, Adobe, AOL, Cloudspace, eBay, Facebook, IBM, Last.fm, LinkedIn, Spotify, Yahoo!, ...

48

Page 49: The Rise of NoSQL

Graph

49

Page 50: The Rise of NoSQL

Graph

http://www.neo4j.org/learn/graphdatabase

50

Page 51: The Rise of NoSQL

51

Page 52: The Rise of NoSQL

Graph DBs disassemble things in fragments and relations

You can do very interesting queries on graph structures - things you can not event think of in SQL

Good for complex graph structured data

Fast lookups, fast traversing

Whiteboard Friendly

Graph DB Characteristics

52

Page 53: The Rise of NoSQL

Open Source Graph Database

Embedded, disk-based, fully transactional

Implemented in Java

GPLv3 and AGPLv3 / commercial

APIs for .NET, Clojure, Go, Groovy, Java, JavaScript, Perl, PHP, Pyhton, Ruby, Scala

Used at Adobe, Cisco, Telekom...

53

Page 54: The Rise of NoSQL

Open Source Document Database with Graph oriented extensions

Supports SQL (without join) as query language

Supports ACID transactions

Implemented in Java

Apache License 2.0

Commercial support available

APIs for HTTP/REST, Java, JavaScript, Scala, PHP, Ruby, .NET, Clojure, Node.js, Python, ...

Used at SKY, Spielo, UltraDNS...

54

Page 55: The Rise of NoSQL

Scaling out

55

Page 56: The Rise of NoSQL

Replication

Master

Slave 1 Slave 2 Slave 3

write

read

56

Page 57: The Rise of NoSQL

Sharding

Shard 1 Shard 2 Shard 3

Router

writeread

57

Page 58: The Rise of NoSQL

Hashing Problemscommon way of choosing a server:server = hash(key) mod n

Every object gets hashed to a new location!

What happens, if a server goes down?

58

Page 59: The Rise of NoSQL

Consistent HashingUse same hash function for both objects and servers

shards: A, B, Cobjects: 1, 2, 3, 4

http://www.tom-e-white.com/2007/11/consistent-hashing.html

59

Page 60: The Rise of NoSQL

CAP Theorem

C

A

P

Availability

PartitionToleranceConsistency

http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf

60

Page 61: The Rise of NoSQL

BASE (vs. ACID)

Basical Availability

Soft State

Eventual Consistency

http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

http://www.infoq.com/articles/pritchett-latency

61

Page 62: The Rise of NoSQL

Wrap Up

62

Page 63: The Rise of NoSQL

RDBMS will not die

Use a relational database unless you have good reason not to

63

Page 64: The Rise of NoSQL

RDBMS have their limits

Vertical scaling is expensive and has hard limits

Horizontal scaling is not possible/ limited

Joins on big and distributed tables too expenisve/ too slow

Rigid Schema inappropriate for semi structured/dynamic data (sparse tables)

Consistency is higher rated than availability

64

Page 65: The Rise of NoSQL

NoSQL come to the rescue

Distribution and scalability are fundamental design goals of NoSQL DBs

Tradeoff between Consistency, Availability and horizontal scalability (CAP Theorem, BASE)

Small footprint in favor of ease of use

Outstandingly proven in practice (Google, Amazon, Facebook, LinkedIn, Twitter, ...)

65

Page 66: The Rise of NoSQL

There are cons tooBroad spectrum of products is difficult to understand

You have to get used to designing models for Key/Value or Column Family stores

Mostly no ad hoc queries

No standards - no portability

Sometimes poor documentation

Few commercial support offers

66

Page 67: The Rise of NoSQL

RDBMS vs. NoSQLthink about data think about queries

redundancy is bad redundancy is ok

indexes managed by DB manage own indexes

query over relations no joins

always exact results results may be out of date

SQL proprietary APIs

67

Page 68: The Rise of NoSQL

Size

Complexity

Key Value

Column Family

DocumentGraph

RDBMS

68

Page 69: The Rise of NoSQL

What‘s next?

69

Page 70: The Rise of NoSQL

Polyglot PersistenceNoSQL will break the relational dominance unlike the OODBMSs in the 80ies

RDBMS is not the one and only option any more

Select the storage technology that best fits your current situation

Enterprises will use different storage technologies for different kinds of data

DB is no integration point any more

Apps talk via WebServices and encapsulate their individual data storage technologies

70

Page 71: The Rise of NoSQL

NewSQL

The answer of traditional RDBMS vendors to the great success of NoSQL

Improved RDBMS offer more features and better scalability

Oracle launches Oracle NoSQL, their own NoSQL DB based upon a revised Berkley DB

Oracle, Microsoft, Sybase, IBM, Greenplum, Pervuasive already have a tight Hadoop Integration

„Can‘t fight it? Embrace it!“

71

Page 72: The Rise of NoSQL

Links

72

Page 73: The Rise of NoSQL

Amazon Dynamo Paperhttp://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

Google Big Table Paperhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/de//archive/bigtable-osdi06.pdf

NoSQL Archivehttp://nosql-database.com

DB Engines Rankinghttp://db-engines.com/en/ranking

Recommended Reads

73

Page 74: The Rise of NoSQL

Thx!Arnd KleinbeckSenior Software ArchitectBusiness Division Applications

@akleinbe

74