nosql and acid

NoSQL and ACID

[email protected]: @foundationdb

NoSQL’s Motivation

Make it easy to build and deploy applications.

Ease of scaling and operation

Fault tolerance Many data models Good price/performanceX ACID transactions

What if we had ACID?

Good for financial applications?

Big performance hit?

Sacrifice availability?

Nope… When NoSQL has ACID, it opens up a very different path.

The case for ACID in NoSQL

Bugs don’t appear under concurrency

• ACID means isolation.

• Reason locally rather than globally. – If every transaction maintains an

invariant, then multiple clients running any combination of concurrent transactions also maintain that invariant.

• The impact of each client is isolated.

Isolation means strong abstractions

• Example interface:– storeUser(name, SSN)

– getName(SSN)– getSSN(name)

• Invariant: N == getName(getSSN(N))– Always works with single client.

– Without ACID: Fails with concurrent clients.

– With ACID: Works with concurrent clients.

Abstractions

Abstractions built on a scalable, fault tolerant, transactional foundation inherit those properties.

And are easy to build…

Examples of “easy”

SQL database in one day

Indexed table layer (3 days * 1 intern)

Fractal spatial index in 200 lines:

Remove/decouple features from the DB

With strong abstractions, features can be

moved from the DB to more flexible code.

Examples:

– Indexing

– More efficient data structures (e.g. using

pointers/indirection)

– Query language

Remove/decouple data models

• A NoSQL database with ACID can provide polyglot data models and APIs.– Key-value, graph, column-oriented,

document, relational, publish-subscribe, spatial, blobs, ORMs, analytics, etc…

• Without requiring separate physical databases. This is a huge ops win.

So, why don't we have ACID?

• It’s hard.

• History.

History

Historical Perspective: 2008

In 2008, NoSQL doesn’t really exist yet.

2008

Databases in 2008

NoSQL emerges to replace scalable sharding/caching solutions that had already thrown out consistency.

• BigTable

• Dynamo

• Voldemort

• Cassandra

The CAP2008 theorem

“Pick 2 out of 3”

- Eric Brewer

The CAP2008 theorem

“Data inconsistency in large-scale reliable distributed systems has to be tolerated … [for performance and to

handle faults]”

- Werner Vogles (CTO Amazon.com)

The CAP2008 theorem

“The availability property means that the system is ‘online’ and the client of

the system can expect to receive a response for its request.”

- Wrong descriptions all over the web

CAP2008 Conclusions?

• Scaling requires distributed design

• Distributed requires high availability

• Availability requires no C

So, if we want scalability we have to give up C, a cornerstone

of ACID, right?

Thinking about CAP2008

CAP availability != High availability

Fast forward to CAP2013

“Why ’2 out of 3’ is misleading”

“CAP prohibits… perfect availability”

- Eric Brewer


“Achieving strict consistency can come at a cost in update or read latency, and may result in lower

throughput…”

- Werner Vogles (Amazon CTO)


“…it is better to have application

programmers deal with performance

problems due to overuse of transactions

as bottlenecks arise, rather than always

coding around the lack of transactions.“

- Google (Spanner)

The ACID NoSQL plan

• Maintain both scalability and fault tolerance

• Leverage CAP2013 and deliver a CP system

with true global ACID transactions

• Enable abstractions and many data models

• Deliver high per-node performance

Approaches

NoSQL

TRANSACTIONS / LOCKING

Bolt-on approach

Bolt transactions on top of a database without transactions.

Bolt-on approach

Bolt transactions on top of a database without transactions.

• Upside: Elegance.

• Downsides:– Nerd trap

– Performance. “…integrating multiple layers has its advantages: integrating concurrency control with replication reduces the cost of commit wait in Spanner, for example” -Google

NoSQL

TRANSACTIONS / LOCKING

Transactional building block approach

Use non-scalable transactional DBs as components of a

cluster.

Transactional building block approach

Use non-scalable transactional DBs as components of a cluster.

• Upside: Local transactions are fast

• Downside: Distributed transactions across machines are hard to make fast, and are messy (timeouts required)

Decomposition approach

Decompose the processing pipeline of a traditional ACID DB

into individual stages.

Decomposition approach

Decompose the processing pipeline of a traditional ACID DB into individual stages.

• Stages:– Accept client transactions

– Apply concurrency control

– Write to transaction logs

– Update persistent data representation

• Upside: Performance

• Downside: “Ugly” and complex architecture needs to solve tough problems for each stage

Challenges with ACID

Disconnected operation challenge

• Offline sync is a real application need

Solution:

• Doing it in the DB layer is terrible

• Can (and should) be solved by the app, E.g. by buffering mutations, sync’ing when connected

Split brain challenge

• Any consistent database need a fault-tolerance source of “ground truth”

• Must prevent database from splitting into two independent parts

Solution :

• Using thoughtfully chosen Paxos nodes can yield high availability, even for drastic failure scenarios

• Paxos is not required for each transaction

Latency challenge

• Durability costs latency

• Causal consistency costs latency

Solution:

• Bundling ops reduces overhead

• ACID costs only needed for ACID guarantees

Correctness challenge

• MaybeDB:– Set(key, value) – Might set key to value

– Get(key) – Get a value that key was set to

Solution:

• The much stronger ACID contract requires vastly more powerful tools for testing

Implementation language challenge

We need new tools!

Goal Language

Many asynchronous communicating processes

Erlang?

Engineering for reliability and fault tolerance of large clusters while maintaining correctness

Simulation

Fast algorithms; efficient I/O C++

Tools for achieving ACID

Flow

• A new programming language

• Adds actor-model concurrency to C++11

• New keywords: ACTOR, future, promise, wait, choose, when, streams

• Transcompilation: – Flow code -> C++11 code -> native

Seriously?

Flow allows…

• Easier ACTOR-model coding

• Testability by enabling simulation

• Performance by compiling to native

Flow eases development

Flow output

Flow performance

“Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M. Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the internet!”

- Joe Armstrong (author of “Programming Erlang”)

Flow performance (N=1000, M=1000)

• Ruby (using threads): 1990 seconds

• Ruby (queues): 360 seconds

• Objective C (using threads): 26 seconds

• Java (threads): 12 seconds

• Stackless Python: 1.68 seconds

• Erlang: 1.09 seconds

• Google Go: 0.87 seconds

• Flow: 0.075 seconds

Flow enables testability

• “Lithium” testing framework

• Simulate all physical interfaces

• Simulate failures modes

• Deterministic (!) simulation of entire system

Simulation is the key for correctness.

Testability: Quicksand

FoundationDB is

NoSQL with ACID

FoundationDB

Database software:

• Scalable

• Fault tolerant

• Transactional

• Ordered key-value API

• Layers

Layers

Key-value API

Layers

• An open-source ecosystem

• Common NoSQL data models

• Graph database (implements BluePrints 2.4 standard)

• Zookeeper-like coordination layer

• Celery (distributed task queue) layer

• Many others…

SQL Layer

• A full SQL database in a layer!

• Akiban acquisition

• Unique “table group” concept can physically store related tables in an efficient “object structure”

• Architecture: stateless, local server

Performance results

• Reads of cacheable data are ½ the speed of memcached—with full consistency!

• Random uncacheable reads of 4k ranges saturate network bandwidth

• A 24-machine cluster processing 100% cross-node transactions saturates its SSDs at 890,000 op/s

The big performance result

• Vogels: “Achieving strict consistency can come at a cost in update or read latency, and may result in lower throughput…”

• Ok, so, how much? – Only ~10%!

– Transaction isolation—the “intuitive bottleneck” is accomplished in less than one core.

A vision for NoSQL

• The next generation should maintain– Scalability and fault tolerance

– High performance

• While adding– ACID transactions

– Data model flexibility

Thank you

[email protected]: @foundationdb

nosql and acid

Technology