bcndevcon 2013: usign cassandra and zookeeper to build a distributed, high performance system

37
Using Zookeeper and Cassandra to build a distributed, high performance system Galo Navarro @srvaroa - [email protected] BcnDevCon13

Upload: galo-navarro

Post on 25-Jun-2015

521 views

Category:

Technology


2 download

DESCRIPTION

Slides from my presentation at BCN Dev Con 2013.

TRANSCRIPT

Page 1: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Using Zookeeper and Cassandrato build a distributed, high performance system

Galo Navarro@srvaroa - [email protected]

BcnDevCon13

Page 2: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

About me

Background: backend & architecture in high traffic systems

Current: software engineer @ Midokura

Page 3: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

A talk about databases

Page 4: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Takeaways

New tech signals different emphasis on solving each problem

Solutions are not exclusive: you can combine them.

Go beyond artificial SQL-NoSQL antagonisms:

We share some fundamental problems:● Latency, availability, durability..● True today, 20y ago, 3000y ago

Page 5: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Midonet

Distributed network virtualization system

Context, dataset, requirements

https://www.midokura.com/

Page 6: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Virtualization

Computational resources on demand

Page 7: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

VM VM VM

VM VM VM

VM VM VM

VM VM VM

VM VM VM

VM VM VM

VM VM VM

VM VM VM

the cloud

Page 8: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

VMvRouter

vSwitch vSwitch

Virtual networkVM

internet

VM

VM

VM

Midokura's use case

Each client that rents VMs on the datacentre wants to network them as if they were their own physical resources (e.g.: same L2 domain, private addresses, isolation..)

MidoNet allows the owner of the datacentre do provide that service

Page 9: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Virtual network topology

Metrics, audit logs, monitoring

Virtual network state

destination IP gateway

192.168.0.0/16 192.168.0.12

66.82.1.0/16 66.82.1.1

0.0.0.0/32 10.0.2.1

Routing tables

IP MAC

192.168.1.23 aa:bb:cc:dd:ee:ff

192.168.1.11 11:22:33:44:55:66

vRouter

vSwitchvSwitch

internet

ARP table

Dataset

Page 10: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Usage

VM VM

A daemon captures Packets sent from VMs contained on each physical host.

On new packets, it loads a view of the virtual topology from a (distributed) data store

VM VMVM VMVMVMVM VMVM

Load virtual topology

Page 11: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Usage

VM VM

The daemon simulates the trip through the virtual network until reaching a a destination VM, and identifies the host

Instructs the kernel to route similar packets via a tunnel

VM VMVM VMVMVMVM VMVM

Page 12: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Midonet architecture

IP bus

API

Hosts Storage cluster

Page 13: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Constraints

Consistency

Availability Partition Tolerance

negotiable

critical

What happens if our service doesn't handle network partitions, faulty master, GC pauses, latency, lags, locks..?

- Not just N users unable to see their profiles- But infrastructure failure in the entire datacentre

Page 14: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Midokura's use case

Coming to “NoSQL” not from “Big Data”

But looking for specific mixes of

● Availability● Fault tolerance● Performance● Durability● Low operational cost

How are Cassandra and Zookeeper useful?

Page 15: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Virtual Network State

Assorted data

Metrics

https://cassandra.apache.org/

Page 16: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Cassandra elevator pitch

A massively scalable open source NoSQL database

Supports large amounts of structured, semi-structured, and unstructured data (key-value)

Across multiple data centers

Performance, availability, linear scalability, with no SPF

Page 17: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Cassandra architecture

DC1

DC2

DC3

clients

P2PNo privileged nodesUnified view

Page 18: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Fault tolerance

write (x)

Replication Factor = 3

Consistencylevel = QUORUM

ok

ok

ok

FAIL

faulty node

Page 19: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Fault tolerance

ok

w(x)

Hinted handoff:

coordinator holds datauntil faulty replicarecovers

Page 20: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

read(x)x

RF = 3

CL = QUORUM

x

x

Fault tolerance

Page 21: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

read(x)x

x

x

x'

Consistency

The coordinator will waituntil CL possible across replicas (or fail) - CL can be also 1, 2, ALL..

RF = 3

CL = QUORUM

Page 22: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

read_repair

Consistency

Order issued to thedisagreeing node to reconcile its local copy

?

?

Page 23: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

DC 1 DC 2

RF = 6CL = LOCAL_QUORUM (quorum inside the local DC)CL = EACH_QUORUM (quorum on each DC)

Multi DC

Minimizesexpensivenetwork trips

client req

Page 24: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Latency + Throughput: W

write (key, value) commit log

...

...

write (...)

diskmemory

ok

memtable

X

sstable

indexX

flush

clean

Minimize disk accessImmutablility

- Data in disk doesn't change- Saves IO sync locks- Requires async compaction

Page 25: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

commit log

...

...

write (...)

Latency + Throughput: R

read (key)

diskmemory

sstables

indexX

?

memtable

ok: X

?

Caches

?

Bloom filters

Page 26: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Flexible data model

user[“1”] [email protected]

name email

Simpler schema changes

Marcususer[“2”] [email protected]

Juliususer[“1”] [email protected] stabbed

state

Flexible (good on growth mode)

NAT[“192.168.1.2:80:10.1.1.1:923”] = { ip = “192.12.3.11”, port = “455”

ttl = .... }

Page 27: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Time series

Column names are stored physically sorted● Wide rows enable ordering + efficient filtering● Pack together data that will be queried together

Events (bad) <- applies SQL approach

event[id] = {device=1, time=t1, val=1}

event[id] = {device=1, time=t2, val=2}

Events (better)

event[device1] = { {time=t2, val=2}, {time=t1, val=3} .. }

event[device2] = { {time=t3, val=3}, {time=t4, val=4} .. }

Page 28: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Things to watch

● Data model highly conditioned by queriesvs. SQL's model for many possible queries

● Relearn performance tuningGC, caches, IO patterns, repairs.. understandinginternals is as important as in SQL

● Counter intuitive internalsE.g.: expired data doesn't get deleted immediately (not even “soon”)

● ...

Page 29: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Things to watch

Know well how your clients handle failovers, and tune for your use case:

E.g.: if we process a packet we want low latency, and no failures so:

● How long is a Timeout? ● Retry to a different node or fail fast?● How to distinguish node failure from

transient latency spike?● How many nodes must be up to satisfy CL?

Page 30: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Watch data changes

Service discovery

Coordination

https://zookeeper.apache.org/

Zookeeper

“Because coordinating distributed systems is

a Zoo”

Page 31: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Zookeeper

● High availability

● Performance (in memory, r > w)In memory: limits dataset size (backed by disk)

● Reliable delivery

If a node sees an update, all will eventually

● Total & causal order- Data is delivered in the same order it is sent- A message m is delivered only after all messages sent before m have been delivered

Page 32: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Zookeeper architecture

L1. update

2. proposal 4. commit

3. ack! 3. ack! 3. ack!

● Paxos variant● Ordered messages● Atomic broadcasts

● Leader is not SPF: new one elected upon failure

Page 33: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

ZK Watchers

/midonet

/bridges/A

/ports

/1 = [.., peer = bridgeC/ports/79, .. ]/2 = [.., peer = routerX/ports/53, .. ]

/B/ports

/79 = [.., peer = bridgeA/ports/1, .. ]/routers

...

Change here

Notifiessubscribers ofthese

Page 34: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

ZK Watchers

VM VM

A

B

VM VMVM VMVMVMVM

Binding changes Binding changes

update A! update B!

VMVM

C

update C!

change: cut and add new device

Important: we want tonotify each node ofrelevant changes only!

Page 35: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Remember the scale!

Page 36: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

c

cc

Service discovery (WIP)

n1

n2

n3

ccc

cc

c

/nodes/n1

/n2/n3

register discover

Ephemeral nodes: if the session that created it dies, the node disappears

discover

notify down

Distributed service nodes Clients

Must know ZK cluster (static) but not service nodes (dynamic)

Page 37: BcnDevCon 2013:  Usign Cassandra and Zookeeper to build a distributed, high performance system

Q ? A : Thank you!