2
Agenda
• MySQL visual analysis
• Design considerations
• Web scale challenges
• Characteristics of a
distributed database
• ScaleBase Analysis Genie
• Demo
• Q & A
– Please enter your questions on the GTW side panel
3
Vladi Vexler
Vice President,
Technology and Product Marketing
• Over 15 years experience in software development
and product management
• Experienced in cloud, web and enterprise
• Author of patents in field of databases innovation,
dynamic data caching and machine learning analytics
4
Who Are We?
Distributed Database Management System
Architected for the Cloud
Simple. Reliable. Powerful.
6
What Is Your Goal?
• Scale (mostly) reads
• Scale (mostly) writes
• Performance of reads
– Affected by joins and big tables scans of big tables
• Performance of writes
– Affected by IO r/wr, CPU and table indexes(a growing overhead)
• Locks
• CPU/IO/ RAM issues
• Load peaks
• Data growth
• Geo-distribution, special data distribution needs
7
Database And Tables Metrics to Review
• Size
– Physical size on disk, Logical size (number of rows)
• Multiple/large indices
– Physical impacts (write time) and Logical impact (RAM)
• Reads vs. Writes
– Number of queries per table?
– % of total MySQL traffic
– % of table’s traffic
• Logical data relations – identify and analyze
– Joins – complexity of data distribution and data access
– Logical Data Chunks – related data in multiple tables
9
Scale Out Platform Considerations
DIY <> NewSQL <> NoSQL <> ScaleBase
• Short-term cost vs long-term cost
– Do-it-yourself - open source is not truly free
– Time to market
– Pareto principle – 20% of complications will take 80% of time
– High overhead cost in maintenance and future developments
• Reliability (ACID) vs. simplicity (BASE)
• Maturity and availability/reliability
• Features and limitations
• How to define a good data distribution policy?
– How to evaluate efficiency of a policy for data distribution and access?
– How to simulate different distribution policies and compare?
12
Distributed Table Types
• MASTER: Data on one shard only
– Example: general settings
• GLOBAL: Data copied to all shards
– Example: lookups
• DISTRIBUTED (root):
Data on a single shard, based on a key
– Example: Users table.
• CASCADED (distributed child table): Data on a single shard
however, distribution and access depend on the parent table
– Example: User_Photos, User_Photos_Likes – depend on Users
Note: Not all sharding platforms support Cascaded and Master table types
13
Distributed Queries Types
• ONE_DB - Single-shard execution. Global or Master tables, Distributed
& Cascaded tables, joins of a Distributed and Global tables
• ALL_DB – All-shards execution, one DB-node in a shard cluster:
– SELECT and Aggregate data from many shards – Parallel execution
(“map reduce” style) on all shards, Aggregate, Order, Group-By, Limit
– DDL statements
– DML on Global tables
• FULL_DB – Session statements (USE, SET) to be sent to all database
nodes in all shard clusters
• CROSS_DB – Sharding conflict resolution, such as cross-shard joins.
Note: Not all sharding platforms support ALL_DB, FULL_DB and CROSS_DB queries.
14
Importance of Logical Data Chunks
• Example: A Logical Data Chunk in a Facebook app:
– All rows in tables containing information related to George, from:
Users, Photos, Comments, Likes, Posts, Friends etc…
• Goals:
1. Optimal Data Distribution: Store maximum logical data chunks in
same shards
2. Maximize ONE_DB and ALL_DB queries
3. Handle all complex cases: related data is in multiple shards
– ALL_DB, CROSS_DB, FULL_DB queries
15
Data Relationships can be Extremely Complex
Usually, scale out is applied to growing-mature apps.
How do you define an optimal data distribution policy?
17
ScaleBase Analysis Genie
• A tool enabling MySQL visual analysis and building an optimal data
distribution policy; Designed for DBAs, Architects & Dev. Managers
• Two step-process:
– Analysis Assistant
– An agent captures app/DB information, including SQL traffic and
database metrics
– Obfuscates, summarizes and packages the App-DB data
– Analysis Genie
– a SaaS application, receives the AA package and presents the
visual analysis and details the policy configuration
Analysis Assistant Analysis Genie
18
ScaleBase Analysis Genie
• Advanced analytics
– Your schemas, data &
queries
• Identification of best
data distribution policy
– Customized for even the
most complex apps
• Complete policy control
• Quality assurance
– Review before production
• Policy simulation
– “What-if” analysis
https://www.scalebase.com/software/
20
Relationship Identification
Mapping includes:
• Schemas structures
• Tables & columns names
matching
• Queries parsing and
identification of joined
tables and columns
• Statistics on every object
size and access
25
ScaleBase Genie and ScaleBase Enterprise
Demo Environment
• Visual analysis
• Distribution policy identification and configuration
• Scale out load via data sharding (massive scale out)
ScaleBase
Enterprise
Analysis
Genie
27
Customer: Million+ User Online Gaming Company
Who:
• Mobile gaming company expanding globally
• Hosted on SoftLayer cloud in Hong Kong
Problem:
• Over a million downloads - peak period overload
• Needed scaling in place for expansion
Alternatives considered:
• Manually sharding/open source tools
• Other commercial solutions were too costly
Solution:
• Used visual analysis to determine optimized policy
• Up and running within a few weeks of initial download and now supports hundreds of
thousands of daily users
• Fully operational using data distribution and anticipating additional scale out within
next quarter
28
Scale out to unlimited users
Continuous availability
Dynamic workload optimization
Fast and simple deployment
Easily scale out a single
MySQL instance
Optimized for the Cloud
Reduces time-to-market
No changes needed to app or database
Database usage analytics
Intelligent load balancing
Centralized data management
ScaleBase Distributed Database Management System
29
Get Instant Application/Database Insight!
Use visual analysis to plan your scale out strategy
Download the Analysis Genie here:
https://www.scalebase.com/software
Questions?
Contact Info:Paul Campaniello
Vladi Vexler
Resources:www.scalebase.com
www.scalebase.com/resources
www.scalebase.com/blog
(617) 630.2800