scylladb @ apache bigdata, may 2016
TRANSCRIPT
Capable of 1,800,000 operations per secondPER NODE
With predictable, low latenciesCompatible with Apache Cassandra
Scylla: A new Open Source NoSQL Database
BACKGROUND
SQL: Structured, no scale
Document store: No structureSome scale
Column store: Some structureScale outAwesome HA/DR
Key-value: SimpleScaleNot a real DB
Benchmark configuration● Server type: Rackspace Bare Metal IO Class v1● CPU: Dual 2.8 GHz, 10 core Intel® Xeon® E5-2680 v2● RAM: 128 GB● Networking: Redundant 10 Gb/s connections in a high availability bond● Data Disks: 2 * 1.6 TB PCIe flash cards● OS: CentOS 7.2.1511, Kernel version: 3.10.0-327.10.1.el7.x86_64● Java
○ Cassandra - Oracle jdk-8u65○ Scylla - Open JDK 1.8 (used only for scylla-jmx)
Source: http://www.scylladb.com/technology/ycsb-cassandra-scylla/
WHAT WOULD YOU DO WITH 1 MILLION TPS?Shrink your cluster by a factor of X10Handle 10X traffic spikes on Black FridayFaster repairs, faster scale out.Get the most out of your data - Run more queriesAdministration operations while servingStop using caches in front of the database
SCYLLA IS QUITE DIFFERENTShard-per-core, no locks, no threads, zero-copyReactor programing with C++14Our own efficient, DB-aware cache, not using Linux page cacheBetter storage engineMax out all HW resources - NUMA friendly, multiqueue NICs, etcUserspace I/O schedulerBased on Seastar project
SCYLLA ARCHITECTURE COMPARISON
● KVM was invented by Avi in 2006, development was managed by Dor● It was a new hypervisor after VMW, Xen had dominated the market● By smart design choices and leveraging Linux and the hardware it became the most
performing hypervisor.○ KVM holds SPECvirt performance record○ KVM holds max IOPS record
● The Open Virtualization Alliance includes hundreds of companies, including HP, IBM, Intel, AMD, Red Hat, etc
● KVM is the engine behind many clouds such as OpenStack, IBM, NTT, Fujitsu, HP, Google, DigitalOcean, etc.
Cassandra
TCP/IPScheduler
queuequeuequeuequeuequeuethreads
NICQueues
Kernel
Traditional stack Seastar’s sharded stack
Memory
Lock contentionCache contentionNUMA unfriendly
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
No contentionLinear scalingNUMA friendly
Kernel
CoreDatabase
Task Schedulerqueuequeuequeuequeuequeuesmp queue
Userspace
Scylla has its own task schedulerTraditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a pointer to eventually computed value
Task is a pointer to a lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a function pointer
Stack is a byte array from 64k to megabytes
Context switch cost is
high. Large stacks pollutes
the caches No sharing, millions of
parallel events
Unified cacheCassandra Scylla
Key cache
Row cache
On-heap /Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Unified cacheCassandra Scylla
Key cache
Row cache
On-heap /Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Page faultsParasitic rows
Tuning
Scylla has an I/O scheduler
Traditional stack
Scylla stackMax useful disk concurrency
I/O scheduled
by priority
here
Source: http://www.scylladb.com/2016/04/14/io-scheduler-1/
total, 14825839, 25002, 25002, 25002, 0.5, 0.3, 0.5, 5.0, 12.8, 22.2, 592.6, 0.00076total, 14851605, 24980, 24980, 24980, 0.5, 0.3, 0.6, 6.7, 12.9, 21.6, 593.6, 0.00076total, 14877443, 25004, 25004, 25004, 0.5, 0.3, 0.5, 6.5, 17.5, 38.8, 594.7, 0.00076total, 14903361, 25017, 25017, 25017, 0.5, 0.3, 0.5, 6.9, 29.9, 39.9, 595.7, 0.00076total, 14927655, 23553, 23553, 23553, 4.6, 0.3, 34.3, 66.8, 203.0, 255.9, 596.7, 0.00076total, 14956055, 26384, 26384, 26384, 5.0, 0.4, 27.2, 53.9, 81.5, 99.9, 597.8, 0.00077total, 14981910, 24987, 24987, 24987, 0.5, 0.3, 0.7, 6.2, 13.5, 25.0, 598.8, 0.00077total, 15007673, 25003, 25003, 25003, 0.4, 0.3, 0.5, 3.7, 12.5, 24.5, 599.9, 0.00077total, 15033484, 25006, 25006, 25006, 0.4, 0.3, 0.5, 3.8, 12.4, 32.8, 600.9, 0.00077total, 15059256, 25004, 25004, 25004, 0.4, 0.3, 0.5, 2.2, 14.9, 33.0, 601.9, 0.00076total, 15085126, 24994, 24994, 24994, 0.4, 0.3, 0.5, 2.4, 10.4, 19.4, 603.0, 0.00076total, 15110948, 24988, 24988, 24988, 0.5, 0.3, 0.6, 4.1, 10.3, 19.9, 604.0, 0.00076
Compatibility (and speed): Repair
Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0] Creating new streaming plan for repair-inJan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0] Received streaming plan for repair-inJan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0] Creating new streaming plan for repair-inJan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0] Received streaming plan for repair-in
SCYLLA as an INFRASTRUCTUREScale up with the number of coresKernel bypass for direct networking and block I/OGood match for upcoming Non Volatile Memory technologyHigh availability with gossip and flexible replicationRuns everywhere: Physical, virtual, containersCan be integrated with microservices with its own httpd
❏ Build a community❏ Core database improvements❏ VERTICAL: Spark, Solr, distributed SQL engines❏ HORIZONTAL: Microservice integration, more
protocols❏ Upcoming releases: Scylla 1.0.3, Scylla 1.1
WHAT’S NEXT?