simplifying data management datastax and robin … · datastax enterprise is the best distribution...
TRANSCRIPT
EXECUTIVE SUMMARY
Enterprises across industries struggle to manage massive quantities of data and data entering
systems at a high velocity. While NoSQL has emerged as a solution in some cases, many applica-
tions still rely on a relational database management system (RDBMS). To further complicate the
management of these systems, the NoSQL space has been one of rapid change with many offer-
ings emerging. While data scientists, architects, and developers are choosing the system that best
matches their uses cases; it’s the administrators that are forced to manage all of these complex
systems and meet business SLAs.
Robin Systems has teamed up with DataStax so that administrators can deploy DataStax Enter-
prise (DSE) on Robin’s application-aware infrastructure software which is optimized for container
technologies. DataStax Enterprise is the best distribution of Apache Cassandra™ and also includes
developer tooling, administration and monitoring, search, operational analytics, and graph -- all in
a unified, always-on data platform. By working with Robin Cloud Platform (RCP), an administrator
can now also achieve:
» Productivity improvement by simplified operations and user experience
» Cost reduction by guaranteed performance, even in shared multi-tenant environments, to enable hardware
consolidation
» Risk reduction by repeatable and automated processes such as 1-click cluster provision and patch/upgrade
» Agility optimization with full Application & Data Lifecycle Management with significantly reduced time and
storage
ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM
Simplifying Data Management
WHITE PAPER
1
DataStax and Robin Cloud Platform (RCP)with
The power behind the moment.
Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper
The power behind the moment.
A PARTNERSHIP TO ENABLE EFFICIENT MANAGEMENT OF DSE RCP is a container-based, application-centric, server and storage virtualization platform software which turns
commodity hardware into a high-performance, elastic, and agile big data & database consolidation platform. RCP
is designed to cater to not just stateless applications, but also to performance and data-centric applications such
as DataStax Enterprise Clusters. DataStax administrators were facing the following challenges:
» Low Server Utilization -Underlying hardware has to be sized for peak workloads, leaving large amounts of
spare capacity and idle hardware due to varying load profiles.
» Sizing Production Workloads During Development - To accurately size environments an organization must
estimate the read and write performance that is expected from the designed configuration. This requires
testing and experimentation, and yet the infrastructure might be over or under-provisioned.
» Availability Planning - While Cassandra is designed to withstand temporary node failures, permanent node
failures require resolution by addition of replacement nodes which causes additional load on the remaining
active nodes.
» Cloning Data for Dev/Test Environments - Typical scenarios where a subset of the production data is required
are – for debugging bugs, performance and stress testing, split read workload across multiple clusters, etc.
DataStax does not have an automated way to clone a subset of production data.
» Scaling Out vs Up - Administrators need the ability simply add and remove resources (scale up or down)
dynamically to their cluster, in real- time, to deal with temporary load variations.
» Patch & Upgrade Automation - Administrators have to periodically orchestrate updates across all nodes of
Cassandra without any downtime, and be able to rapidly revert changes in case of failures.
RCP dramatically simplifies application and data lifecycle management with features such as one-click database
deploy, snapshot, clone, time travel, dynamic IOPS control, upgrade and performance guarantees.
Consolidate with bare metal performance
SQL & No SQL Databases
ROBIN Application-Aware Compute
ROBIN Application-Aware, Storage and Data
Existing Commodity Hardware
Application Orchestration Qos Guarantee Agile Data Lifecycle Management
Cloud-Extend Compute & Storage
BIG DATA Stateful Distributed Apps
Deliver guaranteed QoS, elastic scaling
Enable agile, simple application management
ROBIN Fabric Controller
ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 3ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 2
Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper
The power behind the moment.
ROBIN CLOUD PLATFORM (RCP)
RCP - THREE KEY COMPONENTS
Container-Based Agile Compute
RCP’s container-based virtualization technology helps consolidate appli-
cations with complete runtime isolation and zero performance impact.
RCP achieves this by turning physical servers, either on premises or in the
cloud, into a container-based compute plane, that can easily grow based
on demand.
Just as a hypervisor abstracts Operating Systems from the underlying
hardware, RCP’s compute plane abstracts applications from OS and
everything underneath, leading to simplified application deployment and
seamless portability for all types of enterprise applications – including
highly performance-sensitive workloads such as databases and Big Data.
Container-Aware Scale-Out StorageRCP’s container-aware, software-defined, block storage is designed
from the ground up to support agile, sub-second volume creation for
containers. RCP converts any commodity hardware with HDD, SSD or
NVMe disks into an enterprise-class storage plane, which can easily grow
with demand. It delivers core services like thin provisioning, compression,
encryption, data protection, simplified data lifecycle management via
snapshots and thin clones, and rapid application restores.
Application-Aware Fabric ControllerThe application-aware fabric controller is the “brain” of the RCP platform.
Making application as the payload, the fabric controller automatically
decides the placement, provisions containers and storage for each
application component, and configures the application – thus enabling
single-click deployment of even the most complex applications. It also
continuously monitors the entire application and infrastructure stack to
automatically recover failed nodes and disks, failover applications, and
ensures that each application dynamically gets adequate disk IO and
network bandwidth to deliver the Application-to-Spindle QoS guarantee.
App
ROBIN Fabric Controller
ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 3
AppApp
App
App App
Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper
The power behind the moment.
THE UNIFIED SOLUTION: DATASTAX ENTERPRISE ON RCPTogether, DataStax and RCP provide advanced functionality designed to accelerate your ability to create intel-
ligent and compelling applications, using powerful indexing, search, analytics and graph functionality, coupled
with a smart infrastructure platform required to deploy and manage all your application and data components.
THE UNIFIED SOLUTION PROVIDES1. Improved utilization and predictable performance
2. Simplified cluster lifecycle management and improved availability
3. Agile data management
4. Comprehensive scaling strategies
ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 5ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 4
Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper
The power behind the moment.
Improved Utilization and Predictable PerformanceRCP uses containers to provide 1-click, rapid, self-service deployment of Cassandra and DataStax clusters. While
containers provide process isolation, RCP’s App-to-Spindle Quality of Service feature, combined with container
technology, provides complete performance isolation. This means, only RCP allows multiple applications to run
on the same server and storage without impacting each other, thus increasing the average hardware utilization
and significantly larger consolidation ratios. Typically, customers see over 40% reduction in hardware by adopting
Robin Cloud Platform.
RCP is designed to provide the benefits of virtualization, without sacrificing performance. The graph below shows
the results of the seven YCSB tests on a 3-node Cassandra cluster running on - first, bare metal servers with local
storage, and then running on the Robin Cloud Platform. The results clearly show that the introduction of RCP into
the environment had negligible impact to the overall performance of the Cassandra cluster.
Simplified Cluster Lifecycle Management and Improved Availability
RCP’s orchestration capabilities combined with DataStax OpsCenter make deployment of large and complex
clusters a breeze, while the App-to-Spindle Quality of Service control removes the pressure of right-sizing the
cluster for desired load and performance profile during initial deployment. Administrators can now dynamically
and in real-time make changes to the CPU, memory, and read and write IOPs assigned to the individual clusters.
RCP makes permanent node failures a thing of the past, and even temporary failures are reduced to a matter of
few seconds. RCP can seamlessly relocate containers from failed nodes to healthy ones, all while retaining the
0
20000
40000
60000
80000
100000
120000
A (LOAD) A (RUN) B C D E F
Cassandra YCSB Benchmark for 1 BILLION RECORDS
Bare Metal vs Robin vs Virtual Machines
Bare Metal (No containers + Local Storage)
Robin (Containers + Robin Storage)
VM (KVM + Local Storage)
ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 5
Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper
The power behind the moment.
same volumes and IP addresses. This means applications see only a small downtime, and administrators are
saved from the overhead of adding replacement nodes and rebalancing data.
Snapshots can also be used to create thin
clones of the entire database cluster. Similar to
snapshots, the clones are space-efficient as well,
and hence can be leveraged to create rapid
copies of large production clusters for develop-
ment and testing purposes, with no performance
penalties.
Thus, RCP offers cluster-wide automated backup,
restore, and cloning resulting in greatly simplified
storage planning and operations.
Agile Data ManagementRCP allows unlimited snapshots (point-in-time copy) of the complete database cluster, including OS, DB binary,
configuration, schema, and data. RCP snapshots are space efficient, online, and only capture the delta changes
since the last snapshot. Hence they can be utilized as an effective replacement for traditional database backups.
These snapshots can be used to restore or refresh database to the desired point-in-time without having to move
large backup files on and off the database cluster nodes.
Overview Chart
59
158
9591
Snapshot
3968037200347203224029760272802480022320198401736014880124009920744049602480
0
CPU Cores
Memory
Read IOPs
Write IOPs
Priority
1 8 10
0
1 GB 32 GB
20,000 50,000
50,000
Read IOPs (4KB)
Before applying QoS After applying QoSthat guaranteed `20K IOPs
ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 7ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 6
Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper Simplifying Data Management with DataStax & Robin Cloud Platform (RCP) - White Paper
The power behind the moment.
Comprehensive Scaling StrategiesAs discussed earlier, scaling typically means incremental addition of nodes. While RCP greatly simplifies this
activity by implicitly provisioning the infrastructure along with the software, it also adds the new paradigm of
scaling clusters up and down. This means you can dynamically and in real-time change resource allocation to
individual clusters. This makes it easy to cater to transient spikes in workload due to expected or unexpected
changes in application usage.
BENEFITSRobin Systems platform is architected from the ground up to deliver a complete shared platform for hosting all of
an enterprise’s data and data-driven distributed applications. Some of the key benefits are described below.
Operational Agility & Simplicity Lower Costs Better, Predictable Application Perfor-
mance
» Single-click provisioning of clusters and complex distributed applications
» Push-button cluster extend, applica-tion cloning and snapshots
» Improved Application Uptime with auto-failover
» CAPEX Reduction – Potential savings of up to 40% with lower HW footprint
» Lower software licensing cost through application consoli-dation on shared hardware
» Application consolidation with bare metal performance
» Automatic Application-to-Spindle performance SLA enforcement
SUMMARYMany companies have experimented with Docker and other container technologies but they have discovered
that these tools, along with their basic storage support in volume plugins, only solve the problem of deployment
and scale, but are unable to address challenges with container failover, data and performance management,
and the ability to take care of transient workloads, which are critical for distributed platforms like DataStax Enter-
prise.
Robin Systems together with DataStax, provide the complete package for data management, where the data
administrators and consumers can just focus on their use cases, while the tedious tasks of deployment, backups,
restore, clone, scaling, and performance management are completely automated. This greatly improves IT
productivity, and enables them to support and deliver on the promise of agility.
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
ROBIN SYSTEMS | 224 AIRPORT PKWY SAN JOSE CA 95110 | (408) 216-0769 | ROBINSYSTEMS.COM 7