dotnettoscana: nosql revolution - scalability

47
Scalability Luigi Berrettini Nicola Baldi http://it.linkedin.com/in/nicolabaldi http://it.linkedin.com/in/luigiberrettini

Upload: nicola-baldi

Post on 01-Nov-2014

471 views

Category:

Technology


1 download

DESCRIPTION

http://www.dotnettoscana.org/nosql-revolution.aspx

TRANSCRIPT

Page 1: DotNetToscana: NoSQL Revolution - Scalability

Scalability

Luigi Berrettini

Nicola Baldihttp://it.linkedin.com/in/nicolabaldi

http://it.linkedin.com/in/luigiberrettini

Page 2: DotNetToscana: NoSQL Revolution - Scalability

The need for speed

15/12/2012 Scalability 2

Page 3: DotNetToscana: NoSQL Revolution - Scalability

An increasing demand for performance

Companies continuously increase

More and more data and traffic

More and more computing resources needed

SOLUTION

SCALING

15/12/2012 Scalability – The need for speed 3

Page 4: DotNetToscana: NoSQL Revolution - Scalability

Scaling strategiesvertical scalability = scale up single server performance ⇒ more resources (CPUs, storage,

memory) volumes increase ⇒ more difficult and expensive to

scale not reliable: individual machine failures are common

horizontal scalability = scale out cluster of servers performance ⇒ more servers cheaper hardware (more likely to fail) volumes increase ⇒ complexity ~ constant, costs ~

linear reliability: CAN operate despite failures complex: use only if benefits are compelling

15/12/2012 Scalability – The need for speed 4

Page 5: DotNetToscana: NoSQL Revolution - Scalability

Vertical scalability

15/12/2012 Scalability 5

Page 6: DotNetToscana: NoSQL Revolution - Scalability

Single server

All data on a single node

Use cases data usage = mostly processing aggregates many graph databases

Pros/Cons RDBMSs or NoSQL databases simplest and most often recommended option only vertical scalability

15/12/2012 Scalability – Vertical scalability 6

Page 7: DotNetToscana: NoSQL Revolution - Scalability

Architectures anddistribution models

Horizontal scalability

15/12/2012 Scalability 7

Page 8: DotNetToscana: NoSQL Revolution - Scalability

Scale out architectures (1)

Shared everything every node has access to all data all nodes share memory and disk storage used on some RDBMSs

15/12/2012Scalability – Horizontal scalability: architectures and distribution

models 8

Page 9: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: architectures and distribution models

9

Scale out architectures (2)

15/12/2012

Shared disk every node has access to all data all nodes share disk storage used on some RDBMSs

Page 10: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: architectures and distribution models

10

Scale out architectures (3)

15/12/2012

Shared nothing nodes are independent and self-sufficient no shared memory or disk storage used on some RDBMSs and all NoSQL databases

Page 11: DotNetToscana: NoSQL Revolution - Scalability

Shared nothingdistribution models

Shardingdifferent data put on different nodes

Replicationsame data copied over multiple nodes

Sharding + replicationthe two orthogonal techniques

combined

15/12/2012 11Scalability – Horizontal scalability: architectures and distribution

models

Page 12: DotNetToscana: NoSQL Revolution - Scalability

Sharding (1)

Different parts of the data onto different nodes data accessed together (aggregates) are on the same

node clumps arranged by physical location, to keep load even,

or according to any domain-specific access rule

Shard

R W

Shard

R W

Shard

R W

15/12/2012 12Scalability – Horizontal scalability: architectures and distribution

models

AFH

BEG

CDI

Page 13: DotNetToscana: NoSQL Revolution - Scalability

Sharding (2)

Use cases different people access different parts of the dataset to horizontally scale writes

Pros/Cons “manual” sharding with every RDBMS or NoSQL store better read performance better write performance low resilience: all but failing node data available high licensing costs for RDBMSs difficult or impossible cluster-level operations

(querying, transactions, consistency controls)

15/12/2012 13Scalability – Horizontal scalability: architectures and distribution

models

Page 14: DotNetToscana: NoSQL Revolution - Scalability

Master-slave replication (1)

Data replicated across multiple nodes

One designated master (primary) node• contains the original• processes writes and passes them on

All other nodes are slave (secondary)• contain the copies• synchronized with the master during a replication process

15/12/2012 14Scalability – Horizontal scalability: architectures and distribution

models

Page 15: DotNetToscana: NoSQL Revolution - Scalability

Master-slave replication (2)

15/12/2012 15Scalability – Horizontal scalability: architectures and distribution

models

Slave

R

ABC

Master

R W

ABC

Slave

R

ABC

MASTER-SLAVE REPLICATION

Page 16: DotNetToscana: NoSQL Revolution - Scalability

Master-slave replication (3)

Use cases load balancing cluster: data usage mostly read-

intensive failover cluster: single server with hot backup

Pros/Cons better read performance worse write performance (write management) high read (slave) resilience:

master failure ⇒ slaves can still handle read requests low write (master) resilience:

master failure ⇒ no writes until old/new master is up read inconsistencies: update not propagated to all

slaves master = bottleneck and single point of failure high licensing costs for RDBMSs

15/12/2012 16Scalability – Horizontal scalability: architectures and distribution

models

Page 17: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: architectures and distribution models

17

Peer-to-peer / multi-master replication (1)

15/12/2012

Data replicated across multiple nodes

All nodes are peer (equal weight): no master, no slaves

All nodes can both read and write

Page 18: DotNetToscana: NoSQL Revolution - Scalability

Peer-to-peer / multi-master replication (2)

15/12/2012

Peer

R W

18Scalability – Horizontal scalability: architectures and distribution

models

Peer

R W

ABC

ABC

Peer

R W

ABC

Page 19: DotNetToscana: NoSQL Revolution - Scalability

Peer-to-peer / multi-master replication (3)

Use cases load balancing cluster: data usage read/write-

intensive need to scale out more easily

Pros/Cons better read performance better write performance high resilience:

node failure ⇒ reads/writes handled by other nodes read inconsistencies: update not propagated to all

nodes write inconsistencies: same record at the same time high licensing costs for RDBMSs15/12/2012 19

Scalability – Horizontal scalability: architectures and distribution models

Page 20: DotNetToscana: NoSQL Revolution - Scalability

Sharding + replication

Sharding + master-slave replication multiple masters each data item has a single master node configurations:• master• slave• master for some data / slave for other data

Sharding + peer-to-peer replication

15/12/2012 20Scalability – Horizontal scalability: architectures and distribution

models

Page 21: DotNetToscana: NoSQL Revolution - Scalability

Sharding + master-slave replication

15/12/2012 21Scalability – Horizontal scalability: architectures and distribution

models

Master 1

R W

AFH

Master/Slave 2

R W

BEG

Slave 3

R

CDI

Slave 1

R

AFH

Slave/Master 2

R W

BEG

Master 3

R W

CDI

Page 22: DotNetToscana: NoSQL Revolution - Scalability

Sharding + peer-to-peer replication

15/12/2012 22Scalability – Horizontal scalability: architectures and distribution

models

Peer 1/2

R W

AFH

Peer 3/4

R W

BEG

Peer 2/3

R W

BHG

Peer 5/6

R W

CDI

Peer 1/4

R W

AFE

Peer 5/6

R W

CDI

Page 23: DotNetToscana: NoSQL Revolution - Scalability

Scaling out on RDBMSs (1)

15/12/2012 23Scalability – Horizontal scalability: architectures and distribution

models

Oracle DatabaseOracle RAC shared everything

Microsoft SQL ServerAll editions shared nothing

master-slave replication

IBM DB2DB2 pureScaleshared diskDB2 HADR shared nothing

master-slave replication (failover cluster)

Page 24: DotNetToscana: NoSQL Revolution - Scalability

Scaling out on RDBMSs (2)

15/12/2012 24Scalability – Horizontal scalability: architectures and distribution

models

Oracle MySQLMySQL Clustershared nothing

sharding, replication, sharding + replication

The PostgreSQL Global Development Group PostgreSQLPGCluster-II shared diskPostgres-XC shared nothing

sharding, replication, sharding + replication

Page 25: DotNetToscana: NoSQL Revolution - Scalability

Consistency

Horizontal scalability

15/12/2012 25Scalability

Page 26: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: consistency 26

Inconsistencies dueto concurrency

Inconsistent write = write-write conflictmultiple writes of the same data at the same time (highly likely with peer-to-peer replication)

Inconsistent read = read-write conflictread in the middle of someone else’s write

15/12/2012

Page 27: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: consistency 27

Write consistency

Pessimistic approachprevent conflicts from occurring

Optimistic approachdetect conflicts and fix them

15/12/2012

Page 28: DotNetToscana: NoSQL Revolution - Scalability

Pessimistic approach

Implementation write locks ⇒ acquire a lock before updating a value

(only one lock at a time can be tacken)

Pros/Cons often severely degrade system responsiveness often leads to deadlocks (hard to prevent/debug) rely on a consistent serialization of the updates*

* sequential consistencyensuring that all nodes apply operations in the same order

15/12/2012 Scalability – Horizontal scalability: consistency 28

Page 29: DotNetToscana: NoSQL Revolution - Scalability

Optimistic approach

15/12/2012 Scalability – Horizontal scalability: consistency 29

Implementation conditional updates ⇒ test a value before updating

it(to see if it's changed since the last read)

merged updates ⇒ merge conflicted updates somehow(save updates, record conflict and merge somehow)

Pros/Cons conditional updates

rely on a consistent serialization of the updates*

* sequential consistencyensuring that all nodes apply operations in the same order

Page 30: DotNetToscana: NoSQL Revolution - Scalability

Read consistency

15/12/2012 Scalability – Horizontal scalability: consistency 30

Logical consistencydifferent data make sense together

Replication consistencysame data ⇒ same value on different replicas

Read-your-writes consistencyusers continue seeing their updates

Page 31: DotNetToscana: NoSQL Revolution - Scalability

Logical consistency

ACID transactions ⇒ aggregate-ignorant DBs

Partially atomic updates ⇒ aggregate-oriented DBs atomic updates within an aggregate no atomic updates between aggregates updates of multiple aggregates: inconsistency

window replication can lengthen inconsistency windows

15/12/2012 Scalability – Horizontal scalability: consistency 31

Page 32: DotNetToscana: NoSQL Revolution - Scalability

Replication consistency

Eventual consistency

nodes may have replication inconsistencies:stale (out of date) data

eventually all nodes will be synchronized

15/12/2012 Scalability – Horizontal scalability: consistency 32

Page 33: DotNetToscana: NoSQL Revolution - Scalability

Read-your-writes consistency

Session consistency within a user’s session there is read-your-writes consistency

(no stale data read from a node after an update on another one)

consistency lost if• session ends• the system is accessed simultaneously from different PCs

implementations• sticky session/session affinity = sessions tied to one node

affects load balancing quite intricate with master-slave replication

• version stamps track latest version stamp seen by a session ensure that all interactions with the data store include it

15/12/2012 Scalability – Horizontal scalability: consistency 33

Page 34: DotNetToscana: NoSQL Revolution - Scalability

CAP theorem

Horizontal scalability

15/12/2012 Scalability 34

Page 35: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 35

DefinitionsConsistencyall nodes see the same data at the same time

Latencythe response time in interactions between nodes

Availability every nonfailing node must reply to requests the limit of latency that we are prepared to tolerate:

once latency gets too high, we give up and treat data as unavailable

Partition tolerancethe cluster can survive communication breakages(separating it into partitions unable to communicate with each other)

15/12/2012

Page 36: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 36

ACID (1)1) read(A)2) A = A – 503) write(A)4) read(B)5) B = B + 506) write(B)

15/12/2012

Atomicity• transaction fails after 3 and before 6 ⇒ the system should

ensure that its updates are not reflected in the database Consistency• A + B is unchanged by the execution of the transaction

Transaction to transfer $50from account A to account B

Page 37: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 37

ACID (2)1) read(A)2) A = A – 503) write(A)4) read(B)5) B = B + 506) write(B)

15/12/2012

Isolation• another transaction will see inconsistent data between 3

and 6 (A + B will be less than it should be)• Isolation can be ensured trivially by running transactions

serially ⇒ performance issue

Durability• user notified that transaction completed ($50 transferred)

⇒ transaction updates must persist despite failures

Transaction to transfer $50from account A to account B

Page 38: DotNetToscana: NoSQL Revolution - Scalability

BASE

15/12/2012 Scalability – Horizontal scalability: CAP theorem 38

Basically AvailableSoft stateEventually consistent

Soft state and eventual consistency are techniques that work well in the presence of partitions and thus promote

availability

Page 39: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 39

CAP theorem(Brewer, Gilbert, Lynch)

Given the three properties of Consistency, Availability and

Partition tolerance,you can only get two

15/12/2012

Page 40: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 40

Single server systemCA

Cbeing up and keeping consistency is reasonable

Aone node: if it’s up it’s available

Pa single machine can’t partition

15/12/2012

Page 41: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 41

Two nodes clusterAP

15/12/2012

AP ( C )partition ⇒ update on one node = inconsistency

Page 42: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 42

Two nodes clusterCP

15/12/2012

CP ( A )partition ⇒ consistency only if one nonfailing

node stops replying to requests

Page 43: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 43

Two nodes clusterCA

15/12/2012

CA ( P )nodes communicate ⇒ C and A can be preserved

partition ⇒ all nodes on one partition must be turned off (failing nodes preserve A) difficult and expensive

Page 44: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 44

It is all about trading off (1)

ACID databasesfocus on consistency first and availability second

BASE databasesfocus on availability first and consistency second

15/12/2012

Page 45: DotNetToscana: NoSQL Revolution - Scalability

It is all about trading off (2)

Single server no partitions consistency versus performance: relaxed isolation

levels or no transactions

Cluster consistency versus latency/availability durability versus performance (e.g. in memory DBs) durability versus latency (e.g. the master

acknowledges the update to the client only after having been acknowledged by some slaves)

15/12/2012 Scalability – Horizontal scalability: CAP theorem 45

Page 46: DotNetToscana: NoSQL Revolution - Scalability

Scalability – Horizontal scalability: CAP theorem 46

Master-slave replication and strong consistency

strong write consistency ⇒ write to the master

strong read consistency ⇒ read from the master

15/12/2012

Page 47: DotNetToscana: NoSQL Revolution - Scalability

Peer-to-peer replication and strong consistency

N = replication factor(nodes involved in replication NOT nodes in the cluster)

W = nodes confirming a writeR = nodes needed for a consistent read

write quorum: W > N/2 read quorum: R + W > N

Consistency is on a per operation basis

Choose the most appropriate combination of problems and advantages15/12/2012 Scalability – Horizontal scalability: CAP theorem 47