mysql options on aws: self-

62

Upload: others

Post on 15-Mar-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

MySQL options on AWS: Self-managed, managed, and serverless

D A T 3 1 6

Chayan Biswas

Sr. Product Manager

Amazon Web Services

Yoav Eilat

Sr. Product Manager

Amazon Web Services

AgendaMySQL options on AWS

1. Self-managed MySQL on Amazon EC2

2. Amazon RDS for MySQL & RDS for MariaDB

3. Aurora MySQL – Provisioned and Serverless

Deep dive into which to use when

AWS offers three options for running MySQL

Self-managed AWS-managed

Database

OS

Storage

Application

Proxy

Orc

he

stra

tio

n

Database

OS

Storage

Application

Proxy

Orc

he

stra

tio

n

Amazon EC2

Amazon RDSMySQL & MariaDB

Amazon Aurora

Benefits of self-managed MySQL on EC2

Database

OS

Storage

Application

Proxy

Orc

he

stra

tio

n

Amazon Relational Database Service (Amazon RDS) Managed relational database service with a choice of popular database engines

Easy to administerEasily deploy and maintain hardware, OS, and DB software; built-in monitoring

Performant & scalableScale compute and storage with a few clicks; minimal downtime for your application

Available & durableAutomatic Multi-AZ data replication; automated backup, snapshots, and failover

Secure & compliantData encryption at rest and in transit; industry compliance and assurance programs

Microsoft SQL Server Oracle

Amazon AuroraEnterprise database at open source price

Delivered as a managed service

Amazon Aurora

Speed and availability of high-end commercial databases

Simplicity and cost-effectiveness of open-source databases

Drop-in compatibility with MySQL and PostgreSQL

Simple pay-as-you-go pricing

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

What do you want from your database service?

Manageability

Customization Manageability

RDS (AWS-managed) customizationsSuitable for almost all MySQL customers

Plugins & extensions

Database

Compute instance

Database storage

Orc

he

stra

tio

n

Backups Set up backups & manual snapshots

Select supported family

Select current database version

Choose storage type & size

Configure via parameter groups

Via console, CLI,API →

Database stack

Amazon EC2 (self-managed) customizationsSuitable for some edge cases

Plugins & extensions

Database

Compute instance

Database storage

Orc

he

stra

tio

n

Backups You’re on your own

Select EC2 instance

Select any database

Select any AWS storage

Install anything

You’re on your own →

Database stack

Manageability

Amazon RDS manageability highlights

• Automated OS and database upgrades

• Push-button scaling

• Managed binlog replication

• Log upload to Amazon CloudWatch Logs

• Industry compliance

• Multiple instance families including R5, M5, T3

• Auto-scaling storage up to 64TiB and 80K PIOPS

• Per-second billing

NEW!

Amazon RDS

NEW!

NEW!

RDS MySQL backups, snapshots, and point-in-time restore

Transaction logs

EBS SnapshotAmazon

S3

Two options—automated backups and manual snapshots

EBS snapshots stored in Amazon S3

Transaction logs stored every 5 minutes in Amazon S3 to support point-in-time recovery

No performance penalty for backups

Snapshots can be copied across Regions or shared with other accounts

Availability Zone 1

Availability Zone 2

Region 1

EBS vol.

Region 2

EBS vol.

EBS vol.

Application

Amazon Aurora scale-out, distributed architecture

Purpose-built log-structured distributed storage system designed for databases

Storage volume is striped across hundreds of storage nodes distributed over 3 different Availability Zones

Six copies of data, two copies in each Availability Zone to protect against AZ+1 failures

Continuous backups to Amazon S3 and faster restores

Shared storage volume

Availability

Zone 1

SQL

Transactions

Caching

Availability

Zone 2

SQL

Transactions

Caching

Availability

Zone 3

SQL

Transactions

Caching

Amazon

S3

Aurora Serverless

• Starts up on demand, shuts down when not in use

• Adjusts capacity automatically

• No application impact when scaling

• Pay per second, one-minute minimum

• Great for infrequently used, unpredictable or cyclical workloads

• Data API for running queries over HTTP

Warm pool of instances

DB storage

Request routers

Application

Scalable DB capacity

Log exports to Amazon CloudWatch Logs

Search: Look for specific events across log files

Metrics: Measure activity in your Aurora DB cluster

• Continuously monitor activity in your DB clusters by sending audit, general, slow query, and error logs

to CloudWatch Logs

• Export to S3 for long-term archival; analyze logs using Amazon Athena; visualize logs with

Amazon QuickSight

Visualizations: Create activity dashboards

Alarms: Get notified or take actions

Amazon CloudWatch

Amazon Athena

Amazon QuickSight

Amazon S3Amazon RDS

Monitor RDS with CloudWatch

Amazon CloudWatch metrics

• CPU/Storage/Memory

• Swap usage

• I/O (read and write)

• Latency (read and write)

• Throughput (read and write)

• Replica lag

• Amazon CloudWatch alarms

• Similar to on-premises monitoring tools

Enhanced monitoring

• Access to additional CPU, memory, file system, and disk I/O metrics

• As low as one-second intervals

Integration with third-party monitoring tools

Performance Insights

Dashboard showing database load▪ Easy—e.g., drag and drop

▪ Powerful—drill down using zoom in

Identifies source of bottlenecks▪ Sort by top SQL

▪ Slice by host, user, wait events

Adjustable time frame▪ Hour, day, week, month

▪ Up to 2 years of data; 7 days free

Support for MariaDB NEW!

Max vCPUCPU bottleneck

SQL w/ high CPU

MariaDB: Autocommit on

MariaDB: Autocommit off

Recommendations

• Best-practice recommendations from AWS

• Parameter recommendations

• Non-default custom memory parameters, change

buffering enabled, logging to table

• Performance-related recommendations

• Query cache enabled, previous generation instance

• Aurora cluster recommendations

• One instance, One AZ, outdated DB version

NEW!

NEW!

NEW!

Developer Productivity

Manageability

Many applications, including serverless apps, have a large number of open DB connections and high connection open/close rate, exhausting DB resources

Modern apps can have

1,000s of DB connections,

exhausting DB resources

Custom failure handling

code can contain security

risks like DB credentials

Self-managed proxy servers

help manage DB load, but

are difficult to deploy

Amazon RDS Proxy: PreviewINTRODUCING

Fully managed, highly available database proxy feature for Amazon

RDS. Pools and shares DB connections to make applications more

scalable, more resilient to database failures, and more secure.

Pool and share

app scalingavailability

DB failover times

datasecurityaccess controls

Fully managed compatible

Amazon RDS Proxy regions and engines

• RDS Proxy is available in:

• Asia Pacific (Tokyo)

• EU West (Ireland)

• US East (N. Virginia)

• US East (Ohio)

• US West (Oregon)

• RDS Proxy supports the following engines:

• RDS MySQL 5.6 and 5.7

• Aurora MySQL 5.6 and 5.7

• RDS and Aurora PostgreSQL coming soon!

Amazon RDS Proxy: How it works

Connection pooling

• Sets of system variables, sets of user-defined variables, call of locking functions, tables locks, creates of temporary tables, prepares statement prepare call

RD

S P

roxy

Connection pooling

Seamless and faster failovers

Fast, seamless failover

RD

S P

roxy

Application connections preserved during failovers

Detects failovers and connects to standby quicker, bypassing DNS caches

Up to 66% faster failover times

No passwords embedded in code

const signer = new AWS.RDS.Signer({ username: process.env.username, hostname: process.env.endpoint, region: 'us-east-2', port: 3306, })

const token = await signer.getAuthToken()

const connection = mysql.createConnection({ host: process.env.endpoint, user: process.env.username, password: token, database: process.env.database, }) connection.connect()

Amazon RDS Data API for HTTP access

Millions of IOT/

mobile devices

Data API fleet

REST API

endpoint

Amazon Aurora

Serverless

Serverless apps often have restrictions

Limited network connectivity to the DB

No persistent connection to the DB

Small client (e.g., IoT) with limited resources

RDS Data API provides:

Public endpoint accessible via HTTP

Access without any client configuration

How do you incorporate ML-based predictions into an app?

Adding ML to an application is challenging

Typical steps require ML expertise & manual work

Write application code to read data from the database

2

Format the data for the ML model

3Call an ML service to run the ML model on the formatted data

4

Select and train the ML model

1

Format the output for the application

5

Load the results to the application

6

Aurora machine learningSimple, optimized, and secure Aurora, Amazon SageMaker, and Amazon Comprehend

(in preview) integration

ML predictionson relational data

Familiar SQL language, no ML expertise

Security & governance

Low-latency, immediate

Integration with Amazon

SageMaker & Amazon

Comprehend

From six steps

Typical steps require ML expertise & manual work

Write application code to read data from the database

2

Format the data for the ML model

3Call an ML service to run the ML model on the formatted data

4

Select and train the ML model

1

Format the output for the application

5

Load the results to the application

6

To three steps

Run a SQL query to invoke the ML service

2

(Optional) Select and configure the ML model with Flintstone

1

Use the results in the application

3

Use the familiar SQL language for training & prediction

From SQL to ML-driven insights

SELECT * from product_reviews WHERE

aws_comprehend.detect_sentiment

(review_text, ‘EN’)' = 'NEGATIVE'

CREATE TRIGGER insert_check

BEFORE INSERT ON sales

FOR EACH ROW

BEGIN

IF is_transaction_fraudulent(column1,

column2, column3 …) = 'True' THEN

rollback; END IF;

END;

SELECT * from customers order by

predicted_future_spend (column1,

column2, ...)

Aurora-optimized ML query processing

ID Feedback

1 Great product!

Good job

Mediocre

I didn’t like it

Loved it

Terrible service

50 Great service

user_feedback

Select * from

user_feedback where

aws_comprehend.detect

_sentiment(review_text,

‘EN’)' = ‘POSITIVE'"

Performance

Manageability

Read scaling with RDS read replicas

Read replicas offload reads from the source database

Create up to five replicas per source database

Binlog replication (file position or GTID)

Upgrades independent of the source database

Each read replica itself can be multi-AZ

Region

Source Database

ReadReplica

Asynchronous replication

ReadReplica

Write Workload

Read Workloads

Aurora read replicas and database auto-scaling

Up to 15 promotable read replicas across multiple Availability Zones

Redo log-based (physical) replication leads to low replica lag—typically < 10ms

Custom reader endpoint with load balancing and auto-scaling

MasterRead

replica

Read

replica

Read

replica

Reader endpoint #1 Reader endpoint #2

Shared distributed storage volume

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Aurora I/O profile

BINLOG DATA DOUBLE-WRITELOG FRM FILES

T YP E OF W R I T E

MYSQL WITH REPLICA

Amazon EBS mirror

Amazon EBS mirror

AZ 1 AZ 2

Amazon S3

Amazon EBS

Amazon Elastic

Block Store

(Amazon EBS)

Primary

Instance

Replica

Instance

1

2

3

4

5

AZ 1 AZ 3

Primary

Instance

Amazon S3

AZ 2

Replica

Instance

ASYNC

4/6 QUORUM

DISTRIBUTED

WRITES

Replica

Instance

AMAZON AURORA

780K transactions

7,388K I/Os per million txns (excludes mirroring, standby)

Average 7.4 I/Os per transaction

MySQL I/O profile for 30 min Sysbench run

27,378K transactions 35XMORE

0.95 I/Os per transaction (6X amplification) 7.7X LESS

Aurora IO profile for 30 min Sysbench run

Aurora write and read throughputAurora MySQL is up to 5x faster than MySQL

0

50,000

100,000

150,000

200,000

250,000

MySQL 5.6 MySQL 5.7 MySQL 8.0

Aurora 5.6 Aurora 5.7

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

MySQL 5.6 MySQL 5.7 MySQL 8.0

Aurora 5.6 Aurora 5.7

Write Throughput Read Throughput

Using Sysbench with 250 tables and 200,000 rows per table on R4.16XL

Availability

Manageability

Automated, 0-RPO failover across AZs

DNS

Primary Standby

EC2 #1

EC2 #2

EBS #1

EBS #2

Each host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Availability Zone 1 Availability Zone 2

Application

Automated, 0-RPO failover across AZs

DNS

Each host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Failover initiated by automation or through RDS API

Availability Zone 1 Availability Zone 2

Primary Standby

EC2 #1

EBS #1

EC2 #2

EBS #2

Application

Automated, 0-RPO failover across AZs

DNS

PrimaryEach host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Failover initiated by automation or through RDS API

Redirection to the new primary instance is provided through DNS

Availability Zone 1 Availability Zone 2

EC2 #2

EBS #2

EC2 #1

EBS #1

Application

Automated, 0-RPO failover across AZs

DNS

Availability Zone 1 Availability Zone 2

Standby PrimaryEach host manages set of EBS volumes with a full copy of the data

Instances are monitored by an external observer to maintain consensus over quorum

Failover initiated by automation or through RDS API

Redirection to the new primary instance is provided through DNS

EC2 #1

EBS #1

EC2 #2

EBS #2

Application

Aurora instant crash recovery

Traditional database

Have to replay logs since the last checkpoint

Typically 5 minutes between checkpoints

Single-threaded in MySQL; requires that the disk be accessed many times

Amazon Aurora

Underlying storage replays redo records on demand as part of a disk read

Parallel, distributed, asynchronous

No replay for startup

Checkpointed Data Redo Log

Crash at T0 requires

a re-application of the

SQL in the redo log since

last checkpoint

T0 T0

Crash at T0 will result in redo logs being

applied to each segment on demand, in

parallel, asynchronously

Disaster Recovery

Manageability

Planning for disaster recovery

Use a cross-region read replica as a standby database for recovery in the event of a disaster

Read replicas can be configured for multi-AZ to reduce recovery time

Can use delayed replication for MySQL to protect from self-inflicted disasters

Region 1

Availability Zone 2

Synchronous replication

Source/ Primary

Replica/ Primary

Synchronous replication

Replica/ Secondary

Source/ Secondary

Availability Zone 3

Availability Zone 1 Availability Zone 4

Region 2

Asynchronous replication

Aurora Global DatabaseFaster disaster recovery and enhanced data locality

Global replication: Up to 5 secondary regions

High throughput: Up to 200K writes/sec

Low replica lag: < 1 sec cross-region lag

Fast recovery: < 1 min downtime after region unavailability

Oregon

M R R

StorageR R

Storage

OhioR R

Storage

R R

Storage

Northern Virginia

Ireland

(secondary region)

(secondary region)

(secondary region)

(primary region)

Inb

ou

nd

re

pli

cati

on

Inb

ou

nd

re

pli

cati

on

Inb

ou

nd

re

pli

cati

on

Outbound replication

What do you want from your database service?

Security

Manageability

How do I secure my database?

Amazon Virtual Private Cloud (Amazon VPC)Network Isolation

AWS Identity and Access Management (IAM) Resource-level permission controls

AWS Key Management Service (AWS KMS) Encryption at rest

SSL

Protection for data in transit

Password validation• Supports RDS MySQL version 5.6, 5.7, and 8.0

• The MySQL DB instance performs password validation

• Install plugin

mysql> INSTALL PLUGIN validate_password SONAME 'validate_password.so';

Query OK, 0 rows affected (0.01 sec)

• Configure the Parameter group and reboot

Setting up IAM permissions through the console

• Add a pre-existing IAM role to the database, or

• Select an AWS service to integrate with Aurora NEW!

RDS-managed MySQL engines

Standard

The open-source

standard MySQL

Community

The popular

community choice

Performance

The fastest MySQL-

compatible engine

Related breakouts

DAT202 What's new in Amazon Aurora

DAT207 What’s new in Amazon RDS

DAT215 Build high-performing apps on Amazon Aurora Serverless with RDS Data API

DAT309 Amazon Aurora storage demystified: How it all works

DAT321 Deep dive on Amazon Aurora with MySQL compatibility

DAT327 Accelerating application development with Amazon Aurora

DAT346 Planning proof of concept with Amazon Aurora

DAT367 Running SQL Server on Amazon RDS and migrating to Aurora MySQL

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

25+ free digital training courses cover topics and services related to databases, including:

Validate expertise with the new AWS Certified Database - Specialty beta exam

Learn databases with AWS Training and Certification

• Amazon Aurora

• Amazon Neptune

• Amazon DocumentDB

• Amazon DynamoDB

• Amazon ElastiCache

• Amazon Redshift

• Amazon RDS

Visit aws.training

Resources created by the experts at AWS to help you build and validate database skills

Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Yoav Eilat

[email protected]

Chayan Biswas

[email protected]

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.