clustrix database overview

25
CLUSTRIX OVERVIEW The Leading Scale-out SQL Database. Engineered for the Cloud Presenters

Upload: clustrix-database

Post on 03-Dec-2014

1.027 views

Category:

Technology


1 download

DESCRIPTION

Clustrix is the leading scale-out SQL database engineered for the cloud. With Clustrix, you can scale transaction throughput, run real-time analytics and simplify operations.

TRANSCRIPT

Page 1: Clustrix Database Overview

CLUSTRIX OVERVIEW

The Leading Scale-out SQL Database. Engineered for the Cloud

Presenters

Page 2: Clustrix Database Overview

PUBLIC CLOUDPRIVATE CLOUD

WHAT IS CLUSTRIX

DBaaS

• Vertically integrated solution

• In-house: private data center /colocation facility

• Maximum flexibility

• The only scalable primary SQL database in AWS

• Fully managed

• Monthly subscription

• Uses flash appliance

The Leading Scale-out SQL Database. Engineered for the Cloud

Flash Appliance DBaaS

Page 3: Clustrix Database Overview

E-commerce

TARGET APPLICATIONS

Gaming

Agoge

Consumer Web Advertising Analytics Healthcare Analytics

SaaS Gaming

BILLIONS OF ROWS

BILLIONS OF TRANSACTIONS

REAL-TIMEANALYTICS

MILLIONS OF USERS/DEVICES

Page 4: Clustrix Database Overview

IT’S TIME TO REINVENT THE SQL DATABASE

WEB-SCALE APPLICATIONS

GROWING DATA SETS

MILLIONS OF USERS

HIGH CONCURRENCY

REAL-TIMEANALYTICS

BILLIONS OF TRANSACTIONS

CLOUD COMPUTING

SCALE-OUT ARCHITECTURE

FAULT TOLERANT

EASY MANAGEMENT

Page 5: Clustrix Database Overview

SCALING A DATABASE IS HARD

Scale - Up Sharding NoSQL

NoSQL

CUSTOMER PRIORITIES

Time to MarketCost Scale and PerformanceOperational Simplicity

Expensiveband-aid

Application

Relational Logic

Engineering and ops overhead

Engineering and ops overhead

Page 6: Clustrix Database Overview

CLUSTRIX: BUILT FOR SCALE AND THE CLOUD

HIGH-SCALE TRANSACTIONS

• Linear scalability for writes/updates/reads

• Double nodes double transactions/sec

REAL-TIME ANALYTICS

• Linear speedup for analytics

• Double nodes half the query time

ACID, SQL AND MYSQL

SELF-MANAGING

BUILT-IN FAULT TOLERANCE

SCALE-OUT

Add nodes as demand grows

REAL WORKLOADS

Page 7: Clustrix Database Overview

PERFORMANCE AND SCALE

• Massive Media

• Near-linear scalability for reads/writes/updates

• Add more nodes to handle more TPS

• Near-linear speedup for analytics

• More nodes faster queries

• 20 million+ users / 70,000+ TPS• Write heavy workload; 1TB+ writes / day

High Scale Transactions Real Real-Time Analytics

Page 8: Clustrix Database Overview

CLUSTRIX DESIGN

Intelligent Data Distribution

Massively Parallel Query Processing

SharedNothingArchitecture

Node

QueryCompiler

Database Engine

Data map

Node

QueryCompiler

Database Engine

Data map

Node

QueryCompiler

Database Engine

Data map

SQL

SQL

SQL

SQL

SQL

Page 9: Clustrix Database Overview

Node

INTELLIGENT DATA DISTRIBUTION

Billio

ns o

f row

s Tables • Tables split into slices• Each slice has replica on another node

Node Node Node

S1S1 S2S2 S3S3 S4S4 S5S5

S2

S5

• Adding a node triggers re-balance

• Losing a node triggers re-protect

Page 10: Clustrix Database Overview

Node

PARALLEL QUERY PROCESSING

Simple queries

• Fielded by any node

• Routed to data node

Node Node Node

Complex queries

• Split into query fragments

• Process fragments in parallel

Page 11: Clustrix Database Overview

REPLICATION AND DISASTER RECOVERY

MySQL to Clustrix Replication Clustrix to MySQL Replication

Asynchronous replication

MySqlDump Backup

Clustrix Parallel Backup

Fast backup

DISASTER RECOVERY

Page 12: Clustrix Database Overview

CLUSTRIX TOOLS: INSIGHT

Real-time and historical insight into query performance

Monitor database health

Page 13: Clustrix Database Overview

DATABASE LANDSCAPE

Real-Time Analytics (OLAP)

Size: 10s of TerabytesMode: OnlineBest fit: Either

Data Warehousing

Size: PetabytesMode: OfflineBest fit: Column stores

Transactions(OLTP)

Size: 10s of TerabytesMode: OnlineBest fit: Row stores

IN-MEMORY COLUMN STORES

SHARED NOTHING ROW STORE

SHARED NOTHING COLUMN STORES

SINGLE NODEROW STORES

IN-MEMORY ROW STORES

SHARED DATA ROW STORES

100TBs

Query Complexity

MemSQL, VoltDB, MySql Cluster

MySql, MS Sql Server, IBM DB2, Oracle

Oracle RAC, NuoDB

SAP Hana

Clustrix

HP Vertica, EMC Greenplum, Amazon Redshift

Concurrent Writes/Updates

Single node query processing

Massively Parallel Processing1TB

Page 14: Clustrix Database Overview

USE CASES

High-Scale Transactions

MySQL Consolidation

Business Critical MySQL

10x SCALE without DB experts

or app changes

1/10th TCO benefit by eliminating

database sprawl

90% lower downtime with 50%

less TCO

200% performance gain

with 50% less TCO

Operational Intelligence

Page 15: Clustrix Database Overview

QUESTIONS AND NEXT STEPS

Questions?

Page 16: Clustrix Database Overview

OPERATIONAL INTELLIGENCE

Microsoft SQL ServerMedExpert proprietary treatment research

Analytics Application: Professionals provide expert advice to improve patient outcomes

New DoD & Medicare contracts Expected 100x increase in usage

One Scale-Out database • 4 nodes - growth to 20• Minor application changes & tuning

Alternatives ConsideredFusion I/O – 20% boostNo TTM to shard the application

Why ClustrixPOC showed performance boost for analytics queriesand linear scale for long term

Clustrix Results50% - 200% faster query responseTCO less than 50% near term

THE CHALLENGE

Page 17: Clustrix Database Overview

HIGH-SCALE TRANSACTIONS

• Write heavy workload with 1TB+ writes per day

• 20 million+ users / 70,000+ TPS

CLUSTRIX 18 NODES

• 11X+ the TPS of a single MySQL server

• 20B+ Rows of data

“Pre-Clustrix, we spent a lot of time on optimizing for performance and scale. Now we can spend those resources better.”

Toon CoppensCTO and Co-Founder

Massive Media

Page 18: Clustrix Database Overview

BUSINESS CRITICAL MYSQL

SaaS Application: Low cost course materials for education

Chaotic/Unstable MySQL Environment

Back-to-School ExpansionUptime during critical peak season

3 node clusters2 geographic locationsAutomated Fault Tolerance & Easy Expansion

Alternatives Considered Why Clustrix? Clustrix Results80% reduction in downtimeTCO reduction in 50%

HW upgrade = stop gap Replication implementation was unstable & custom

POC showed easy to upgrade and expand a live Ruby on Rails application

THE CHALLENGE

Page 19: Clustrix Database Overview

MYSQL CONSOLIDATION

MySQL Sprawl• 1150 databases• 100 DBAs

Private DBaaS • 10:1 Compression• Re-deploy staff

Alternatives ConsideredFusion I/O couldn’t keep upMySQL tools – too unstable

Clustrix Results90% lower TCO14 nodes today – growth to 35

E-Commerce: ¥1.2 trillion per year

Availability #1 priority

CHALLENGE

Page 20: Clustrix Database Overview

CLUSTRIX TECHNOLOGY

Intelligent Data Distribution Parallel Query EvaluationBi

llions

of r

ows Tables

• Tables split into slices• Auto-distribute, auto-protect, re-protect

Normal queries• Fielded by any node• Routed to data node

Complex queries• Split into query fragments• Process fragments in parallel

SQL

SQL SQL SQL

SQL 1

2

3

SQL

JSON

S1 S2S1 S2

• Application sees a linearly scalable, single instance MySQL database• Automatic fault tolerance• Online expansion, data (re) distribution, and schema changes

Page 21: Clustrix Database Overview

Node

SCALE AND FAULT TOLERANCE

• All data has multiple copies on different nodes

Node Node Node

AA BB CC DD EE

B

E

• Re-balance on adding a node

• Re-protect on losing a node

Page 22: Clustrix Database Overview

Node

PARALLEL QUERY PROCESSING

Node Node

Simple queries

• Fielded by any node

• Routed to data node

Complex queries

• Split into query fragments

• Process fragments in parallel

Page 23: Clustrix Database Overview

ANALYTIC QUERY PROCESSING

Read A, apply filter

SELECT a, bFROM A JOIN B on (id)WHERE (A.a = 15)

Read B and Join

Return to User

Node Node Node

Node Node Node

Analytic queries get speedup from Massively Parallel Processing• Concurrent Parallelism• Pipeline Parallelism

Send each row to correct Node

based on id

Node

StartQuery

Node

Page 24: Clustrix Database Overview

SQL FOR STRUCTURED DATA

SQL winsHierarchical loses Network loses

1970

SQL winsER losesObject loses

1980 1990 2000 2010

RelationalStructuredData

Unstructured Data

Single NodeSQL Struggles

Distributed SQL winsNoSQL wins

Clustrix

VerticaGreenplum

MongoDBCouchDBHadoop

System RIngres

OraclePostgres

NoSQL

Distributed SQL Primary

Distributed SQL Warehousing

With increasing data size,struggling old SQL implementationsare replaced by new Distributed SQL

Page 25: Clustrix Database Overview

CLUSTRIX APPLIANCE

Clustrix Appliance 3 Node Cluster (CLX 4110 )

• 24 Intel Xeon CPU cores • 144GB RAM • 6GB NVRAM • 1.35TB Intel SSD protected

• (2.7TB raw) data capacity• Low-latency Infiniband interconnect