preparing for success: the top deployment do’s and don’ts – couchbase connect 2016
Post on 15-Feb-2017
227 Views
Preview:
TRANSCRIPT
©2016CouchbaseInc.
Best Practices: Preparing for Success- Top Deployment
Do’s & Don’ts- Ian McCloy & Karthik Sekar
©2016CouchbaseInc. 2
Ian McCloyCouchbase Technical Support Manager – EMEA
ian.mccloy@couchbase.com
Ian McCloy is the Technical Support Manager forCouchbase in EMEA. Ian is based in Couchbase’sManchester U.K. engineering lab. Previously, Ian was asenior inventor and team lead with IBM’s System andTechnology Group’s Storage Systems division. Ianspecializes in OS platforms, virtualization, distributedsystems, storage, and networking. Ian is a VMwarecertified engineer and has authored many patents in adiverse range of technologies from server hardware,virtualization, software testing, security, and languagetranslation.
©2016CouchbaseInc. 3
Karthik SekarSolutions Architect- WW Field Operations
karthik@couchbase.com
Karthik is a Solutions Architect, part of ProfessionalServices - Worldwide Technical Field Operations atCouchbase with expertise in BigData, NoSQLtechnologies, cloud, and distributed computingplatforms and he specializes in setting up,configuring, securing, tuning, architecting andmanaging mission critical systems. His current majorresponsibilities are providing architecture reviews,technical product assistance & sharing best practiceswith the customers.
©2016CouchbaseInc.©2016CouchbaseInc.
Agenda• OS Tuning
• Platform Tuning
• Bucket Recommendations
• Sizing- Why Sizing is Crucial for Success
• Best Practices – Views & Indexing
• The Breaking Point
©2016CouchbaseInc.©2016CouchbaseInc.
OS Tuning (Linux)
Linux
Disable THP (Transparent Huge Pages)
Use XFS File System
Create Swap Space
Set Swappiness to 0
©2016CouchbaseInc.©2016CouchbaseInc.
OS Tuning (Windows)
Windows
Increase TCP ephemeral ports
Tune Virus Scanners
Tune Backup Utilities
©2016CouchbaseInc.©2016CouchbaseInc.
Platform Tuning (Virtualization)
Virtual Machine
Avoid a Single Point of Failure
Do not over-commit your resource
Avoid Live Migration
Increase Auto-Failover Threshold
©2016CouchbaseInc.©2016CouchbaseInc.
System Clocks
Ensure NTP (Network Time Protocol) is configured correctly before adding a node to the cluster
TTL (Time To Live) Document Expiryis per Couchbase Node
©2016CouchbaseInc.©2016CouchbaseInc.
Server Quota
Keep the per node Server Quota to best practice of 80% maximum total system RAM
Provision enough capacity before the demand exceeds the availability of the cluster, failure to do this can prevent normal recovery procedures
File-System Cache needs RAM
Proactive monitoring of workload and cluster capacity
©2016CouchbaseInc.©2016CouchbaseInc.
Bucket Tuning
Use as few buckets as possible.
Disable “Flush” on production buckets to preventaccidental data loss.
©2016CouchbaseInc.©2016CouchbaseInc.
Replica Recommendations
As a rule of thumb, we recommend the following:
One replica for up to five data service nodes.
One or two replicas for five to ten nodes.
One, two, or three replicas for over ten nodes.
©2016CouchbaseInc.©2016CouchbaseInc.
Why Sizing Matters
one of the most common support issues is due to under-provisioned clusters
• Data Explosion : Explosive growth of applications for Digital Economy - the web, mobile and IoT
• Performance Demands – Low latency access
• Capacity Planning - Sufficient capacity to the handle peak loads
• High Availability – For today’s mission critical apps, high availability is no longer a ‘nice to have’ but is essential.
13
©2016CouchbaseInc.©2016CouchbaseInc.
Hardware Minimums
RAM: Atleast~8GB(highlydependentondataset)Disk: Fastest“local”storageavailable
-SSDisbetter-RAID0,10-AvoidSAN,ifPossible
CPU (minimums): 8cores+1-perbucket+1-perdesigndocument(ifViewsareused)+1-per10KMutations/sec+1-perXDCRstream
Hardwarerequirements/recommendationsaretheintersectionofwhat’sneededversuswhat’savailable.
©2016CouchbaseInc.©2016CouchbaseInc.
Sizing Couchbase Server
• Multi-Dimensional Scalability (MDS) – Optionally Scale each service independently:
• Data• Index• Query
MDS is the architecture that enables independent
scaling of data, query, and indexing workloads while
being managed as one cluster.
©2016CouchbaseInc.©2016CouchbaseInc.
Sizing Couchbase Server - Data
• Data Service:
• Same as previous Couchbase Server 3.x version• Enough RAM to cache reads• Enough Disk to eventually persist writes• CPU primarily for Views and XDCR• At least 3 nodes – Replication at the bucket level
• Minimum requirements: 4GB RAM, 8 Cores CPU
©2016CouchbaseInc.©2016CouchbaseInc.
Sizing Couchbase Server - Index
• Index service:
• Primarily RAM and Disk IO bound• Index Types
• Standard Global Secondary Indexes (GSI)• Memory Optimized Indexes (MOI)
• At least 2 nodes for HA, each index replicated individually• Minimum Requirements: 32 GB RAM, 16 core CPU, “fast disk”, “as much
RAM as you need for MOI”
©2016CouchbaseInc.©2016CouchbaseInc.
Sizing Couchbase Server - Query
• Query Service :
• Primarily CPU bound• Optimized for multi-core systems• Very low RAM and disk requirements• At least 2 nodes for HA – Queries automatically load balanced
• Minimum Requirements: 8 GB RAM, 8+ Core CPU
©2016CouchbaseInc.©2016CouchbaseInc.
How Many Nodes?
20
5 KeyFactorsdeterminenumberofnodesneeded:
1) RAM2) Disk3) CPU4) Network5) DataDistribution/Safety
(per-bucket,multiplebucketsaggregate)
©2016CouchbaseInc.©2016CouchbaseInc. 21
RAM sizing
1) TotalRAM§ Manageddocumentcache:§ Workingset§ Metadata§ Active+Replicas
§ Indexcaching(I/Obuffer)
KeepworkingsetinRAMforbestreadperformance
Server
GivemedocumentA
HereisdocumentA
A
A
A
ReadingDataApplicationServer
©2016CouchbaseInc.©2016CouchbaseInc.
Working set depends on your application
• Resident item ratio shows the total number of active documents that reside in memory. Typically you want your working set (actively accessed documents) to be in memory for low latencies and an awesome user experience.
• Recommended best practices for the working set ( doc resident ratio) is always over 20 %
22
Metadata Overhead Indicates that a bucket is now using more than 50% of the allocated RAM for storing metadata and keys, reducing the amount of RAM available for data values. This is a helpful indicator that you may need to add nodes to your cluster.
©2016CouchbaseInc.©2016CouchbaseInc. 23
Disk Sizing: Space and I/O
2) Disk§ Sustainedwriterate§ Rebalancecapacity§ Backups§ XDCR§ Views/Indexes§ Compaction§ Totaldataset:§ (active+replicas+indexes)§ Append-only
I/O
Space
PleasestoredocumentA
OK,IstoreddocumentA
A
Server
A
A
WritingDataApplicationServer
©2016CouchbaseInc.©2016CouchbaseInc.
Disk Sizing: Space and I/O
• Disk Writes are Buffered• Bursts of data expand the disk write queue• Sustained writes need corresponding throughput
• Disk throughput affected by disk speed• SSD > 10K RPM > EBS• SSDs give a huge boost to write throughput and startup/warmup times• RAID can provide redundancy and increase throughput
• Throughput = read/write+compaction+indexing+XDCR• Best to configure different paths for data and indexes• Plan on about 3x space (append-only, compaction, backups, etc)
©2016CouchbaseInc.©2016CouchbaseInc.
CPU Sizing
25
3) CPU§ Diskwriting§ Views/compaction/XDCR§ RAMr/wperformancenotimpacted§ Min.productionrequirement:
8cores+1perbucket+1coreperDesignDoc(IfViewsareused)+1coreperXDCRstream+1coreforevery10KMutations/sec
©2016CouchbaseInc.©2016CouchbaseInc.
Network Sizing
26
4) Network§ Clienttraffic§ Replication(writes)§ Rebalancing§ XDCR
Reads+Writes
Replication(multiplywrites)andRebalancing
network networknetwork
CouchbaseServer CouchbaseServer CouchbaseServer
ApplicationServer ApplicationServerApplicationServer
©2016CouchbaseInc.©2016CouchbaseInc.
Network Considerations
• Low latency, high throughput (LAN) - within cluster
• Eliminate router hops:• Within Cluster nodes• Between clients and cluster
• Check who else is sharing the network
• Increase bandwidth by:• Add more nodes (will scale linearly)• Upgrade routers/switches/NIC’s/etc
©2016CouchbaseInc.©2015CouchbaseInc. 29
Couchbase Data Access
• Everything is built on top of Key Value
• A Document store is a special case of Key-Value
• Views provide aggregation and real-time analytics through incremental map-reduce
• Global Secondary Indexes provide low latency/high throughput indexes
• N1QL is a language that provides a powerful and expressive way of accessing documents
©2016CouchbaseInc.©2016CouchbaseInc.
Best Practices - Selection, Projection, Aggregation
• Try avoid computing too many things in a View
• Check for attribute existence
• Pre-Filter data to avoid unnecessary entries in the View• Use document types to make Views more selective
• Project (map) only necessary data by emitting it as part of the value• Do not emit the full document• Back-reference via the original document id
• Use the built-in reduce functions if possible
©2016CouchbaseInc.©2016CouchbaseInc.
Number of Design Documents per Bucket
• Indexers are allocated per Design Document
• Bad cases• One Design Document contains all Views
ØAll Views A lot to do for the Indexer are updated the same time
• One View per Design DocumentØResource intensive because one Indexer
per View
• Good balance!
©2016CouchbaseInc.©2016CouchbaseInc.
Data Distribution
35
5)DataDistribution/Safety(assumingonereplica):§ 1node=Singlepointoffailure§ 2nodes=+Replication§ 3+nodes=Besttostartwith,forTest/production§ Autofailover§ Upgrade-ability§ Furtherscale-ability
§ 7+nodes=3Datanodes+2Indexnodes+2Querynodes– MinimumforMDS(forProduction)
§ Singlenodedeploymentisnot forTest/Production,howeverfordevelopment,itisfinetostartwith
Serversfail,beprepared.Themorenodes,thelessimpactafailurewillhave.
©2016CouchbaseInc.©2016CouchbaseInc.
As your Dataset Grows…
36
Effectsonscale/sizing:• YourRAMneedswillgrow:• Metadataneedsincreasewithitemcount• Isyourworkingsetincreasing?• Yourdiskspacewilllikelygrow(duh?)
Indications:• Droppingresidentratio• Risingejections/cachemissratio
Whattodo:• Revisesizingcalculations,addmorenodes• Removeun-neededdata
Thisisthemostcommonneedforscalingandwillmostlikelyresultinneedingmorenodes
©2016CouchbaseInc.©2016CouchbaseInc.
To Ensure Success…
• Test, Deploy, Monitor…rinse and repeat• Drills for Failover Testing
• Simulating Disk Failures, Network Failures, Process Hung, Cluster Failures
• Proactive monitoring, efficient cluster, and process management
• Define the SLA, RPO, RTO
• Stress testing over Nx times than the normal operations
• Isolate systems interfering with the performance
©2016CouchbaseInc.©2016CouchbaseInc.
Sizing is tricky business…
• Work with the Couchbase Team
• Validate your “on-paper” numbers with testing
• Constantly monitor production
• Gather your workload and dataset requirements:§ Item counts and sizes, read/write/delete ratios
• Review our documentation and formulas
top related