monitoring - instaclustr€¦ · new relic, splunk, datadog) have the capability to retrieve...

5

Upload: others

Post on 20-May-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monitoring - Instaclustr€¦ · New Relic, Splunk, DataDog) have the capability to retrieve metrics from Dropwizard or JMX and some provide plugins and pre-built dashboards for Apache
Page 2: Monitoring - Instaclustr€¦ · New Relic, Splunk, DataDog) have the capability to retrieve metrics from Dropwizard or JMX and some provide plugins and pre-built dashboards for Apache

2

Monitoring

Instaclustr Managed Service Instaclustr’s Managed Service includes full monitoring with all configuration completed as part of automated cluster provisioning. Alerts are automatically configured and routed to Instaclustr’s tech-ops team for actioning. Customers have an interactive view of monitoring through the Instaclustr Console and can use a REST API to integrate information from Instaclustr’s monitoring into a single view of an application stack.

Open Source & Other Alternatives Cassandra’s metrics are managed by the popular Dropwizard framework. Monitoring data may be retrieved from Cassandra by directly querying via JMX. Alternatively, metrics may be pushed to an external service by configuring Dropwizard plugins. Graphite and Grafana are popular web-based graphical interfaces for displaying metrics and there are numerous tutorials and sample configurations available on the web for configuring these (just Google “cassandra monitoring grafana”). In addition to the free & open source options, most general enterprise monitoring solutions (e.g. New Relic, Splunk, DataDog) have the capability to retrieve metrics from Dropwizard or JMX and some provide plugins and pre-built dashboards for Apache Cassandra.

Backup and Restore

Instaclustr Managed Service Advanced backup and restore functionality are a key part of the capability of Instaclustr’s Managed Service. Our management system provides:

• Daily snapshot-based backups • Continuous backup with point-in-time recovery (optional extra) • On-demand user initiated back-ups via console or API • On-demand user initiated cluster or table restore via console or API.

Open Source & Other Alternatives Due to the immutable nature of Cassandra data files, basic backup is a straightforward exercise. We’ve seen a lot of organisations have success with simple shell script solutions.

TableSnap (https://github.com/JeremyGrosser/tablesnap) is another, more sophisticated option.

Page 3: Monitoring - Instaclustr€¦ · New Relic, Splunk, DataDog) have the capability to retrieve metrics from Dropwizard or JMX and some provide plugins and pre-built dashboards for Apache

3

Restoring Cassandra is somewhat more complex. However, restores are infrequently required due to Cassandra’s inbuilt replication and native ability to repopulate a replacement node from replicas in the event of hardware failure. Many teams happily rely on manual procedures for infrequent use (although these should definitely be documented and regularly tested). A hybrid scripted/manual restore process to automate of the majority of the steps while manually coordinating cluster startup once restoration is complete is often a good compromise.

Repairs

Repairs are a batch process that must be run regularly (generally at least weekly) against a Cassandra cluster to ensure replicas of data stay consistent over time.

Instaclustr Managed Service Instaclustr’s management system automatically coordinates and runs repairs across managed clusters. Repairs are staggered and coordinated to minimise performance impact on the cluster. Where required, our technical operations team assists in tuning an appropriate repair strategy for your cluster and workload to provide minimal impact repairs.

Open Source & Other Alternatives Cassandra Reaper (http://cassandra-reaper.io/) is the go-to tool for Cassandra repair management. It provides a broad range of scheduling, retry and other features, is broadly deployed and has good community support.

Automated Deployment

Instaclustr Managed Service Instaclustr’s Managed Service provides automated deployment for everything from the initial creation of the cluster through to adding nodes and data centers. The provisioning system takes care of everything including creating cloud provider instances, configuring networking and firewalls and managing attached storage.

Specific features include:

• Support for AWS, Azure, GCP, IBM Softlayer, either in our cloud provide account or yours.

• Create a new cluster including configuration of add-ons such as Spark, Elassandra, Kibana and Zeppelin

• Add nodes or data centers • Dynamically resize the instance in the cluster for quickly scaling up or down the

processing capacity of a cluster

Page 4: Monitoring - Instaclustr€¦ · New Relic, Splunk, DataDog) have the capability to retrieve metrics from Dropwizard or JMX and some provide plugins and pre-built dashboards for Apache

4

Build a new cluster from backups of an existing cluster

Open Source & Other Alternatives There numerous tools that can assist with automated deployment. Which one you choose will likely depend on your existing environment and toolset. For example:

• Puppet/Chef/Ansible/SaltStack/Fabric (eg https://forge.puppet.com/locp/cassandra) • Kubernetes: https://github.com/IBM/Scalable-Cassandra-deployment-on-Kubernetes • Terraform

Cluster Health Checks

Instaclustr Managed Service Instaclustr’s Managed Service provides a cluster health check page which provides at-a-glance feedback on the status of your cluster against key health indicators:

• Disk Usage • Partition Size • Replication Factor and Strategy • Tombstone Levels

Open Source & Other Alternatives All of these factors can be readily checked using built in Cassandra features and available metrics. However, you will likely need to do a bit of work to define the rules you are checking for yourself.