preparing for success: the top deployment do’s and don’ts – couchbase connect 2016

39
©2016 Couchbase Inc. Best Practices: Preparing for Success- Top Deployment Do’s & Don’ts - Ian McCloy & Karthik Sekar

Upload: couchbase

Post on 15-Feb-2017

227 views

Category:

Software


0 download

TRANSCRIPT

©2016CouchbaseInc.

Best Practices: Preparing for Success- Top Deployment

Do’s & Don’ts- Ian McCloy & Karthik Sekar

©2016CouchbaseInc. 2

Ian McCloyCouchbase Technical Support Manager – EMEA

[email protected]

Ian McCloy is the Technical Support Manager forCouchbase in EMEA. Ian is based in Couchbase’sManchester U.K. engineering lab. Previously, Ian was asenior inventor and team lead with IBM’s System andTechnology Group’s Storage Systems division. Ianspecializes in OS platforms, virtualization, distributedsystems, storage, and networking. Ian is a VMwarecertified engineer and has authored many patents in adiverse range of technologies from server hardware,virtualization, software testing, security, and languagetranslation.

©2016CouchbaseInc. 3

Karthik SekarSolutions Architect- WW Field Operations

[email protected]

Karthik is a Solutions Architect, part of ProfessionalServices - Worldwide Technical Field Operations atCouchbase with expertise in BigData, NoSQLtechnologies, cloud, and distributed computingplatforms and he specializes in setting up,configuring, securing, tuning, architecting andmanaging mission critical systems. His current majorresponsibilities are providing architecture reviews,technical product assistance & sharing best practiceswith the customers.

©2016CouchbaseInc.©2016CouchbaseInc.

Agenda• OS Tuning

• Platform Tuning

• Bucket Recommendations

• Sizing- Why Sizing is Crucial for Success

• Best Practices – Views & Indexing

• The Breaking Point

©2016CouchbaseInc.©2016CouchbaseInc.

OS Tuning (Linux)

Linux

Disable THP (Transparent Huge Pages)

Use XFS File System

Create Swap Space

Set Swappiness to 0

©2016CouchbaseInc.©2016CouchbaseInc.

OS Tuning (Windows)

Windows

Increase TCP ephemeral ports

Tune Virus Scanners

Tune Backup Utilities

©2016CouchbaseInc.©2016CouchbaseInc.

Platform Tuning (Virtualization)

Virtual Machine

Avoid a Single Point of Failure

Do not over-commit your resource

Avoid Live Migration

Increase Auto-Failover Threshold

©2016CouchbaseInc.©2016CouchbaseInc.

System Clocks

Ensure NTP (Network Time Protocol) is configured correctly before adding a node to the cluster

TTL (Time To Live) Document Expiryis per Couchbase Node

©2016CouchbaseInc.©2016CouchbaseInc.

Server Quota

Keep the per node Server Quota to best practice of 80% maximum total system RAM

Provision enough capacity before the demand exceeds the availability of the cluster, failure to do this can prevent normal recovery procedures

File-System Cache needs RAM

Proactive monitoring of workload and cluster capacity

©2016CouchbaseInc.©2016CouchbaseInc.

Bucket Tuning

Use as few buckets as possible.

Disable “Flush” on production buckets to preventaccidental data loss.

©2016CouchbaseInc.©2016CouchbaseInc.

Replica Recommendations

As a rule of thumb, we recommend the following:

One replica for up to five data service nodes.

One or two replicas for five to ten nodes.

One, two, or three replicas for over ten nodes.

©2016CouchbaseInc.

Sizing

©2016CouchbaseInc.©2016CouchbaseInc.

Why Sizing Matters

one of the most common support issues is due to under-provisioned clusters

• Data Explosion : Explosive growth of applications for Digital Economy - the web, mobile and IoT

• Performance Demands – Low latency access

• Capacity Planning - Sufficient capacity to the handle peak loads

• High Availability – For today’s mission critical apps, high availability is no longer a ‘nice to have’ but is essential.

13

©2016CouchbaseInc.©2016CouchbaseInc.

Hardware Minimums

RAM: Atleast~8GB(highlydependentondataset)Disk: Fastest“local”storageavailable

-SSDisbetter-RAID0,10-AvoidSAN,ifPossible

CPU (minimums): 8cores+1-perbucket+1-perdesigndocument(ifViewsareused)+1-per10KMutations/sec+1-perXDCRstream

Hardwarerequirements/recommendationsaretheintersectionofwhat’sneededversuswhat’savailable.

©2016CouchbaseInc.©2016CouchbaseInc.

Sizing Couchbase Server

• Multi-Dimensional Scalability (MDS) – Optionally Scale each service independently:

• Data• Index• Query

MDS is the architecture that enables independent

scaling of data, query, and indexing workloads while

being managed as one cluster.

©2016CouchbaseInc.©2016CouchbaseInc.

Sizing Couchbase Server - Data

• Data Service:

• Same as previous Couchbase Server 3.x version• Enough RAM to cache reads• Enough Disk to eventually persist writes• CPU primarily for Views and XDCR• At least 3 nodes – Replication at the bucket level

• Minimum requirements: 4GB RAM, 8 Cores CPU

©2016CouchbaseInc.©2016CouchbaseInc.

Sizing Couchbase Server - Index

• Index service:

• Primarily RAM and Disk IO bound• Index Types

• Standard Global Secondary Indexes (GSI)• Memory Optimized Indexes (MOI)

• At least 2 nodes for HA, each index replicated individually• Minimum Requirements: 32 GB RAM, 16 core CPU, “fast disk”, “as much

RAM as you need for MOI”

©2016CouchbaseInc.©2016CouchbaseInc.

Sizing Couchbase Server - Query

• Query Service :

• Primarily CPU bound• Optimized for multi-core systems• Very low RAM and disk requirements• At least 2 nodes for HA – Queries automatically load balanced

• Minimum Requirements: 8 GB RAM, 8+ Core CPU

©2016CouchbaseInc.

5 Factors of Sizing

©2016CouchbaseInc.©2016CouchbaseInc.

How Many Nodes?

20

5 KeyFactorsdeterminenumberofnodesneeded:

1) RAM2) Disk3) CPU4) Network5) DataDistribution/Safety

(per-bucket,multiplebucketsaggregate)

©2016CouchbaseInc.©2016CouchbaseInc. 21

RAM sizing

1) TotalRAM§ Manageddocumentcache:§ Workingset§ Metadata§ Active+Replicas

§ Indexcaching(I/Obuffer)

KeepworkingsetinRAMforbestreadperformance

Server

GivemedocumentA

HereisdocumentA

A

A

A

ReadingDataApplicationServer

©2016CouchbaseInc.©2016CouchbaseInc.

Working set depends on your application

• Resident item ratio shows the total number of active documents that reside in memory. Typically you want your working set (actively accessed documents) to be in memory for low latencies and an awesome user experience.

• Recommended best practices for the working set ( doc resident ratio) is always over 20 %

22

Metadata Overhead Indicates that a bucket is now using more than 50% of the allocated RAM for storing metadata and keys, reducing the amount of RAM available for data values. This is a helpful indicator that you may need to add nodes to your cluster.

©2016CouchbaseInc.©2016CouchbaseInc. 23

Disk Sizing: Space and I/O

2) Disk§ Sustainedwriterate§ Rebalancecapacity§ Backups§ XDCR§ Views/Indexes§ Compaction§ Totaldataset:§ (active+replicas+indexes)§ Append-only

I/O

Space

PleasestoredocumentA

OK,IstoreddocumentA

A

Server

A

A

WritingDataApplicationServer

©2016CouchbaseInc.©2016CouchbaseInc.

Disk Sizing: Space and I/O

• Disk Writes are Buffered• Bursts of data expand the disk write queue• Sustained writes need corresponding throughput

• Disk throughput affected by disk speed• SSD > 10K RPM > EBS• SSDs give a huge boost to write throughput and startup/warmup times• RAID can provide redundancy and increase throughput

• Throughput = read/write+compaction+indexing+XDCR• Best to configure different paths for data and indexes• Plan on about 3x space (append-only, compaction, backups, etc)

©2016CouchbaseInc.©2016CouchbaseInc.

CPU Sizing

25

3) CPU§ Diskwriting§ Views/compaction/XDCR§ RAMr/wperformancenotimpacted§ Min.productionrequirement:

8cores+1perbucket+1coreperDesignDoc(IfViewsareused)+1coreperXDCRstream+1coreforevery10KMutations/sec

©2016CouchbaseInc.©2016CouchbaseInc.

Network Sizing

26

4) Network§ Clienttraffic§ Replication(writes)§ Rebalancing§ XDCR

Reads+Writes

Replication(multiplywrites)andRebalancing

network networknetwork

CouchbaseServer CouchbaseServer CouchbaseServer

ApplicationServer ApplicationServerApplicationServer

©2016CouchbaseInc.©2016CouchbaseInc.

Network Considerations

• Low latency, high throughput (LAN) - within cluster

• Eliminate router hops:• Within Cluster nodes• Between clients and cluster

• Check who else is sharing the network

• Increase bandwidth by:• Add more nodes (will scale linearly)• Upgrade routers/switches/NIC’s/etc

©2016CouchbaseInc.

Best Practices with Views

©2016CouchbaseInc.©2015CouchbaseInc. 29

Couchbase Data Access

• Everything is built on top of Key Value

• A Document store is a special case of Key-Value

• Views provide aggregation and real-time analytics through incremental map-reduce

• Global Secondary Indexes provide low latency/high throughput indexes

• N1QL is a language that provides a powerful and expressive way of accessing documents

©2016CouchbaseInc.©2016CouchbaseInc.

Best Practices - Selection, Projection, Aggregation

• Try avoid computing too many things in a View

• Check for attribute existence

• Pre-Filter data to avoid unnecessary entries in the View• Use document types to make Views more selective

• Project (map) only necessary data by emitting it as part of the value• Do not emit the full document• Back-reference via the original document id

• Use the built-in reduce functions if possible

©2016CouchbaseInc.©2016CouchbaseInc.

Best Practices - Selection, Projection, Aggregation

©2016CouchbaseInc.

Database Design Considerationsfor Views

©2016CouchbaseInc.©2016CouchbaseInc.

Number of Design Documents per Bucket

• Indexers are allocated per Design Document

• Bad cases• One Design Document contains all Views

ØAll Views A lot to do for the Indexer are updated the same time

• One View per Design DocumentØResource intensive because one Indexer

per View

• Good balance!

©2016CouchbaseInc.

The Breaking Point

©2016CouchbaseInc.©2016CouchbaseInc.

Data Distribution

35

5)DataDistribution/Safety(assumingonereplica):§ 1node=Singlepointoffailure§ 2nodes=+Replication§ 3+nodes=Besttostartwith,forTest/production§ Autofailover§ Upgrade-ability§ Furtherscale-ability

§ 7+nodes=3Datanodes+2Indexnodes+2Querynodes– MinimumforMDS(forProduction)

§ Singlenodedeploymentisnot forTest/Production,howeverfordevelopment,itisfinetostartwith

Serversfail,beprepared.Themorenodes,thelessimpactafailurewillhave.

©2016CouchbaseInc.©2016CouchbaseInc.

As your Dataset Grows…

36

Effectsonscale/sizing:• YourRAMneedswillgrow:• Metadataneedsincreasewithitemcount• Isyourworkingsetincreasing?• Yourdiskspacewilllikelygrow(duh?)

Indications:• Droppingresidentratio• Risingejections/cachemissratio

Whattodo:• Revisesizingcalculations,addmorenodes• Removeun-neededdata

Thisisthemostcommonneedforscalingandwillmostlikelyresultinneedingmorenodes

©2016CouchbaseInc.©2016CouchbaseInc.

To Ensure Success…

• Test, Deploy, Monitor…rinse and repeat• Drills for Failover Testing

• Simulating Disk Failures, Network Failures, Process Hung, Cluster Failures

• Proactive monitoring, efficient cluster, and process management

• Define the SLA, RPO, RTO

• Stress testing over Nx times than the normal operations

• Isolate systems interfering with the performance

©2016CouchbaseInc.©2016CouchbaseInc.

Sizing is tricky business…

• Work with the Couchbase Team

• Validate your “on-paper” numbers with testing

• Constantly monitor production

• Gather your workload and dataset requirements:§ Item counts and sizes, read/write/delete ratios

• Review our documentation and formulas

©2016CouchbaseInc.

Thank [email protected]

[email protected]