introducing multi-dimensional scaling: independent scalability for query and indexing performance...
TRANSCRIPT
Multi Dimensional Scalingwith Couchbase Server
4.0Cihan Biyikoglu | Dir. Product Management,
Couchbase
©2015 Couchbase Inc. 2
Agenda Brief History of Scaling
– Scaling up and out NoSQL Workloads and Scalability Model
– Core Data operations, Indexing and Query Introducing Multi Dimensional Scalability
– Services and Independent Scalability Demo Q&A
History of Scaling
©2015 Couchbase Inc. 4
Question
Few million people are looking for a setup to efficiently live and interact. What is the most efficient way to build this infra?A) Build one giant high-rise?B) Build some mid-rises?C) Build many single-family homes
©2015 Couchbase Inc. 5
Scaling UpBuild one big high-rise Vertical Scaling
– Cluster processors – hyper-threading to cores– Locally partition workload among processors – Communicate over memory
Great for fast processing but limited in scalability and elasticity
©2015 Couchbase Inc. 6
Scaling outBuild a large community of single-family houses Horizontal Scaling
– Cluster commodity HW– Partition workload among nodes – Communicate over network
Great for scaling and elasticity but slower communication
©2015 Couchbase Inc. 7
So what is the right model
?
NoSQL Workloads &Scalability Model
©2014 Couchbase, Inc. ©2015 Couchbase Inc. 9
NoSQL Workloads One Database Many Workloads
– Core Data Processing: GETs & SETs for given key– Indexing: Index maintenance and lookups– Querying: Combine index and data with complex just-in-time
data re-shaping, ordering, grouping, aggregations and more
Varying resource requirements - CPU, RAM, I/O, Network
Varying methods to optimize latency & throughput for each
9
©2015 Couchbase Inc. 10
Scalability Model TodayHomogenous Scaling
– Each node get a slice of the workload– Simple to do…
But...• Workloads compete and interfere with each other• Cant fine tune each workload
- Core Data operation are partition-able so great with wider fan-out- Indexing and Query not always partition-able so worse with wider fan-out
Index Service
Couchbase Cluster
Query ServiceData Service
node1 node8
Introducing Multi Dimensional Scalability
©2015 Couchbase Inc. 12
Modern ArchitectureWhat is Multi-Dimensional Scalability? MDS is the architecture that enables independent scaling of data, query and indexing workloads.
Index Service
Couchbase Cluster
Query ServiceData Service
node1 node8
©2015 Couchbase Inc. 13
Modern Architecture Isolated Service for minimized interference
– Independent “zones” for Query, Index and Data Services
Minimize indexing and query overhead on core KV operations
Index ServiceGlobal
Secondary Indexes
Couchbase Cluster
Query Service
Data ServiceViews and Geo Views
node1 node8
©2015 Couchbase Inc. 14
Modern Architecture Independent Scalability for Best Computational Capacity per Service
Heavier indexing (index more fields) : scale up index service nodesMore RAM for query processing: scale up query service nodes
Couchbase Cluster
node1 node8
Data Service
Index ServiceQuery Service
Under the HoodServices Architecture
Data, Index & Query
©2015 Couchbase Inc. 16
Couchbase Server 4.0 - Cluster Architecture
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
©2015 Couchbase Inc. 17
Couchbase Server 4.0 - Cluster Architecture
17
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 2
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service
DEMO18
Connectivity
©2015 Couchbase Inc. 20
Connectivity and Client Libraries
Type Port EndpointREST 8091, 18091 Admin Connections
Pointed at any node in the cluster
REST 8091, 18092 Query with View Load balanced across node of the cluster that runs data service
REST 8093, 18093 Query with N1QL Load balanced across node of the cluster that runs query service
ONLINE 11210, 11207 Core Data OperationsState-full connections from client app to nodes of the cluster that runs data service
©2015 Couchbase Inc. 21
Connectivity and Client Libraries Connectivity Phases1. Auth2. Discovery
• Get cluster map 3. Service Connection
• Auth to Service• Run operation• If (topology_change) • Rerun #2
21
1,2 3
…
©2015 Couchbase Inc. 22
Discovery and Cluster Map
©2015 Couchbase Inc. 23
Discovery and Cluster Map
©2015 Couchbase Inc. 24
Discovery and Cluster Map – 2 New Nodes
Replication
©2015 Couchbase Inc. 26
Database Change Protocol (DCP)Fast Streaming Replication DCP - An open streaming protocol that conveys the consistent database state to all
consumers– Ordering (vbucket based seq.number)– Re-startable, Resumable (version histories and rollbacks)– Consistent (snapshots)– High Performance (memory based with dedup)
Master
Local Replic
a
Index
Map/Reduc
eRemot
eReplic
a
IndexMap/Reduc
e
Source Cluster
Cross Data Center Cluster
Hadoop
Client/Applicati
on
NotificationIn future
Integration
Backup/Export
Tooling
Cluster Manager
©2015 Couchbase Inc. 28
Cluster ManagerCluster Manager = Governor of the ClusterManages cluster level operations and coordination among nodes
– Cluster Membership & Service Layout– Node Status & Failover– Data Placement & Rebalance– Auth
28
©2015 Couchbase Inc. 29
Cluster ManagerInside Cluster Manager
per-node-&-bucket services
generic distributed facilities
generic local facilities
Logging and Other Services
distributed node discovery
Master Services- cluster level
operations - data placement - rebalancer- auto-failover
Admin Portal – REST API
Global Config (gossip replication)
Local Config Store
Per-node Services - Heartbeats, - Babysitter
Bucket services - dcp init and teardown- stats collectors,
Auth
30©2014 Couchbase Inc.
Adding Nodes to Cluster Online
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD5
SHARD2
SHARD SHARD
SHARD4
SHARD SHARD
SHARD1
SHARD3
SHARD SHARD
SHARD4
SHARD1
SHARD8
SHARD SHARD SHARD
SHARD6
SHARD3
SHARD2
SHARD SHARD SHARD
SHARD7
SHARD9
SHARD5
SHARD SHARD SHARD
SHARD7
SHARD
SHARD6
SHARD
SHARD8
SHARD9
SHARD
READ/WRITE/UPDATE
Cluster Manager receives the new nodes - Node inherit cluster
settings- Move active and replica
vbuckets using DCP- As vbuckets catch up,
Initiate online handoff from “existing node” to “new node”
Clients Receive Topology Change Notification- Trap not_my_vbucket
errors- Refresh cluster map and
retry operation
Data Service
©2015 Couchbase Inc. 32
Data Service Data Service = GET/SET + Map-Reduce Views*Tackles fast core data operations with efficient caching and disk persistence
Core Database Operations– Core GET/SET operations– Couchstore Based Storage
Terms:Bucket = database reside within a clustervBucket = hash partition of the database that reside within a node 32
©2015 Couchbase Inc. 33
Data Manager Architecture
…
Database Engine (ep-engine)
Listener
vBucket Manager
Item Pager
Expiry PagerCheckpoint Manager
CachePartition Hash
Tables (Active and
Replica)
Partition Hash Tables
(Active and Replica)
Partition Hash Tables
(Active and Replica)
AuthNetwork IO
Flusher
Scheduler
Reader IO
Writer IO
Non IO
Batch Reader
Query Service
©2015 Couchbase Inc. 35
Query Service Query Service = N1QLTackles N1QL Query execution
– Query Execution– N1QL Parser & Optimizer: tokenize N1QL statement, and
generate an execution plan based utilizing indexes– Query Execution Engine: Assigns resources to query and
coordinates query execution.– Data Sources: Pluggable “data source driver” layer for
accessing data sources in Couchbase Server (data and index service) and other external data provides
35
©2015 Couchbase Inc. 36
Query Service N1QL Query Processing
Query Engine
Query Processor
Listeners
Parser Optimizer
Data Stores
Execution Engine
Couchbase Server
Auth DataIndexersGSI View
s
Others…
8093/18903
File systemData Service
Index Service
......
Cluster Manager
Bucket#2
Bucket#2
Index#2
Index#1
Index Service
©2015 Couchbase Inc. 38
Index Service Global Secondary Indexes (NEW in 4.0)Tackles indexer for fast query execution with efficient index maintenance for N1QL Queries
– High Performance Indexing– Projector and Router : Coordinate and communicate efficient
index change notifications between data service and index service.
– Supervisor – Indexer and scannerIndexer : Maintain large number of indexes as change
notifications arriveScanner: Respond to Query Service index-scan requests with
rich set of consistency dials– Index Storage &Caching
ForestDB: Brand new storage engine for high performance index caching and storage
38
©2015 Couchbase Inc. 39
Data Service
Projector & Router
Indexing Service
Query ServiceIndex Service
SupervisorIndex maintenance &
Scan coordinator
Index#2
Index#1
Query Processorcbq-engine
Bucket#1
Bucket#2
DCP Stream Index#4Index#3
...Bucket#2
Bucket#1
Projector and Router: 1 Projector and Router per node1 stream of changes per buckets per supervisor
ForestDBStorage Engine Supervisor
1 Supervisor per nodeMany indexes per Supervisor
Recap
©2015 Couchbase Inc. 41
Recap MSD enables unprecedented control on scalability
with Couchbase Server– Separate out competing workloads to independent services– Independently scale each service “zone” within the cluster
Couchbase Server with MDS maximizes scalability and performance– Improves scale and performance to degrees not possible with
other NoSQL or Big Data engines on premise or in the cloud– Improved price/performance and squeezes more performance
and throughput for mission critical systems
Get Started with Couchbase Server 4.0 - Couchbase.com/Downloads
Q&ACihan Biyikoglu | [email protected] |
@cihangirb