introduction to aws database services
DESCRIPTION
AWS offers customers a range of different database options. These include Amazon DynamoDB, a fully-managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data as well as Amazon Relational Database Service (RDS), a service that makes it easy to set up, operate, and scale a relational database in the cloud with support for MySQL, Microsoft SQL Server, PostgreSQL, and Oracle Database. In this session you’ll get an overview of AWS database options and how they might help support your application and see how to get started.TRANSCRIPT
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Introduction to Database Services
Brian Rice, Product Marketing Manager,
Amazon RDSMarch 26, 2014
Today’s agenda
• Why managed database services?
• A non-relational managed database
• A relational managed database
• A managed in-memory cache
• A managed data warehouse
• What to do next
If you host your databases on-premises
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
If you host your databases on-premises
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
If you host your databases in
Amazon EC2
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
If you host your databases in
Amazon EC2
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
you
App optimization
Power, HVAC, net
Rack & stack
Server maintenance
OS installation
If you choose a managed DB service
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
App optimization
High availability
DB s/w installs
OS installation
you
Scaling
The self-managed vs. AWS-managed
decisionSelf-managed database AWS-managed database
You have full responsibility for
upgrades and backup
Upgrades, backup, and failover are
provided as a service
You have full responsibility for security AWS provides high infrastructure
security, certifications; gives you tools
to ensure DB security
Full control over parameters of server,
OS, and database
Database is a managed appliance, so
you can easily automate
Replication is expensive, complex and
requires a lot of engineering
Failover is a packaged service
A Managed Service for Each Major DB Type
Amazon DynamoDB
Key-value
Amazon RDS
SQL
Amazon ElastiCache
In-memory cache
Amazon Redshift
Data warehouse
What is Amazon DynamoDB?
DynamoDB: a managed key-value store
• Simple and fast to deploy
• Simple and fast to scale
• To millions of IOPS
• Data is automatically replicated
• Fast, predictable performance– Backed by SSD storage
• Secondary indexes offer fast lookups
• No cost to get started; pay only for what you consume
DynamoDB
Smugmug relies on DynamoDB
• Millions of customers use Smugmug’s photo and video sharing service to store billions of valuable photos and videos
• DynamoDB lets Smugmug scale quickly, frees engineers from database management
“"DynamoDB is a truly
revolutionary product …
I love how DynamoDB
enables us to provision our
desired throughput, and
achieve low latency and
seamless scale, even with
our constantly growing
workloads.”-Don MacAskill, CEO
DynamoDB: schemaless database
table items
Attributes
(name/value
pairs)
DynamoDB: each item must include a key
Hash key
(DynamoDB maintains an
unordered index)
DynamoDB: each item must include a key
Hash key
Range key
(DynamoDB maintains a
sorted index)
DynamoDB: Local Secondary Indexes =
alternate range keys
Hash key
Range key
LSI key
DynamoDB: Global Secondary Indexes =
“Pivot Charts” for your table
Choose which
attributes
to project (if any)
DynamoDB: provision throughput
Read
capacity units
Write
capacity units
DynamoDB: What are capacity units?
1 1
1 1
Pay to bearer
on demand
1 write per sec
of up to 1KB
1 1
1 1
Pay to bearer
on demand
1 read per sec
of up to 4KB
Eventually consistent reads at 50% off!
One write capacity unit
One read capacity unit
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
Manage tables
Query specific
items OR scan the
full table
“Select”, “insert”,
“update” items
Bulk select or
update (max 1MB)
DynamoDB is optimized for developer
productivity
• Scale up for service levels; Scale down for cost
How customers autoscale with DynamoDB
Credits:
tadaa;
Sebastian
Dahlgren
DynamoDB: simple app architecture
Elastic Load
Balancing Amazon EC2
App Instances
Clients
DynamoDB
Auth &
business logic
DynamoDB: serverless app architecture
Clients
AWS
Client JS
AWS
Client JS
AWS
Client JSLogin with
Amazon,
Facebook, or
Auth via
Web Identity
FederationBusiness logic
Fine-
grained
access
control
How DynamoDB billing works
Monthly
bill = GB +
Assumes DB instance accessed only from AWS region
Further details at http://aws.amazon.com/dynamodb/pricing/
≈ 5 GB * $0.25 +
21 * 720 hrs * $0.0065/10 +
35 * 720 hrs * $0.0065/50
≈ $14.36
Storage consumed(plus 100 bytes per item)
Charge for
write capacity units
per hour
+Charge for
read capacity units
per hour
What is
Amazon Relational Database Service?
Amazon RDS: a managed SQL service
• Simple and fast to deploy
• Simple and fast to scale
• AWS handles patching, backups, replication
• Compatible with your applications
– Choose among MySQL, PostgreSQL,
Oracle, SQL Server
• Fast, predictable performance
• No cost to get started; pay only for what you consume
Amazon RDS
Flipboard relies on Amazon RDS
• Flipboard is an online magazine with millions of users and billions of “flips” per month
• Uses Amazon RDS and its Multi-AZ capabilities to store mission critical user data
"We were able to go from
concept to delivered product
in about six months with just
a handful of engineers." -Greg Scallan, Chief Architect
How Amazon RDS delivers high performance
• Choose Provisioned IOPS storage for high,
predictable performance– Provision up to 3 TB storage and 30,000 IOPS per database
instance
– Scale IOPS up or down online
• Choose a database instance type with the right
amount of CPU and memory
How Amazon RDS backups work
• Automated backups– Restore your database to a point in time
– Enabled by default
– Choose a retention period, up to 35 days
• Manual snapshots– Initiated by you
– Persist until you delete them
– Stored in Amazon Simple Storage Service (Amazon S3)
– Build a new database instance from a snapshot when needed
Choose Multi-AZ for greater availability, durability
• An Availability Zone is a physically distinct, independent infrastructure
• With Multi-AZ operation, your database is synchronously replicated to another zone in the same AWS region
• Failover occurs automatically in response to the most important failure scenarios
• Planned maintenance is applied first to backup
Choose Read Replicas for greater scalability
• Help your app scale by offloading read traffic to
an automatically maintained read replica
• Create multiple read replicas, load-share traffic
• Easy to set upNative
MySQL
RDS
Choose cross-region snapshot copy for even greater
durability, ease of migration
• Copy a database snapshot to a different AWS region
• Warm standby for disaster recovery
• Or use it as a base for migration to a different region
Choose cross-region read replicas for enhanced data
locality, even more ease of migration
• Even faster recovery
in the event of
disaster
• Bring data close to
your customers
• Promote to a master
for easy migration
How to scale with Amazon RDS
• Scale up or down with resizable instance types
• Scale your storage up with a few clicks while online
• Offload read traffic to read replicas
• Put a cache in front of Amazon RDS– Amazon ElastiCache for Memcached or Redis
– Or your favorite cache, self-managed in Amazon EC2
• Amazon RDS takes some of the pain out of sharding
NoSQL vs. SQL for a new app: how to choose?
• Want simplest possible
DB management?
• Dataset grows without
bound?
• Need joins, transactions,
frequent table scans?
• Team has SQL skills?
• Dataset growth
manageable?
Amazon DynamoDB Amazon RDS
How Amazon RDS billing works
Monthly
bill = GB+ +
Assumes DB instance accessed only from Amazon EC2
Further details at http://aws.amazon.com/rds/pricing/
= 720 hrs * $0.60 + 100 GB * $0.125 + 1,000 IOPS * $0.10
= $544.50
db.m3.xlarge; MySQL;
Oregon; Single-AZ;
On-Demand
100 GB
Provisioned IOPS
Provisioned 1,000
IOPS
What is Amazon ElastiCache?
ElastiCache: resizable in-memory cache
• High performance, resizable in-memory caching
• Speed your application by bypassing database access and disk storage
• Compatible with your existing applications– Choose between the popular memcached
and Redis engines
ElastiCache
2U relies on ElastiCache
• 2U, Inc. , is a “School As A Service”
provider that helps universities take
their degrees online.
• To support collaboration and
learning, the company’s technology
platform uses ElastiCache to cache
data that grows exponentially as
students communicate with
instructors and with each other.
• ElastiCache is used to cache news
feeds and data from RDS MySQL.
“ElastiCache helps us
specifically a lot around
our social and
collaborative tools…It just
works. We don’t even
know its there.”- James Kenigsberg
Chief Technology Officer
Use Cases for ElastiCache
• Performance or cost optimization of an
underlying database
• Storage of ephemeral key-value data
• High-performance application patterns
ElastiCache: simple app architecture
Elastic Load
Balancing Amazon EC2
App Instances
Clients
Amazon RDS
ElastiCache
How ElastiCache billing works
Monthly
bill = N ×
Further details at http://aws.amazon.com/elasticache/pricing/
= 4 nodes * 720 hrs * $0.31
= $892.80
Standard large; Oregon;
On-Demand
What is Amazon Redshift?
Amazon Redshift: a managed data warehouse
• Petabyte-scale columnar
database
• Fast response time – ~10x that of typical relational stores
• Pricing as low as $1,000 per
TB per year
Amazon Redshift
Foursquare relies on Amazon Redshift
• More than 40 million people worldwide use Foursquare to meet up with friends, exchange travel tips, and find money-saving deals
• Foursquare uses AWS to perform analytics across millions of daily check-ins, saving licensing fees and redeploying its dev/ops staff on more strategic work
“Amazon Redshift offers the
performance we needed
while freeing us from the
licensing costs of our
previous solution.”
--Jon Hoffman
Software Engineer
Use Cases for Amazon Redshift
• Reduce costs by extending
DW rather than adding HW
• Migrate completely from
existing DW systems
• Respond faster to business;
provision in minutes
• Improve performance by an
order of magnitude
• Make more data available
for analysis
• Access business data via
standard reporting tools
• Add analytic functionality to
applications
• Scale DW capacity as
demand grows
• Reduce HW & SW costs by
an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
Amazon Redshift architecture
Leader node
Compute nodes
Existing business
intelligence tools
PostgreSQL
JDBC/ODBC
Amazon S3
Amazon DynamoDB
AWS Data
Pipeline
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage • With row storage you do
unnecessary I/O
• To get total amount, you have
to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With column storage, you
only read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Slides not intended for redistribution.
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage• COPY compresses
automatically
• You can analyze and override
• More performance, less cost
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Track the minimum and
maximum value for each block
• Skip over blocks that don’t
contain relevant data
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
DW.HS1.8XL:
• > 2 GB/s scan rate
• Optimized for data processing
• High disk density
DW.HS1.XL:
Amazon Redshift: start small and grow big
Dense Storage Node (dw1.xlarge)
2 TB, 16 GB RAM, 2 cores
Dense Compute Node (dw2.large)
0.16 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (up to 64 TB)
8XL Dense Storage Node (dw1.8xlarge)
16 TB, 128 GB RAM, 16 cores, 10 GigE
8XL Dense Compute Node (dw2.8xlarge)
2.56 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (up to 1.6 PB)
Note: Nodes not to scale
How Amazon Redshift billing works
Monthly
bill = N ×
Further details at http://aws.amazon.com/rds/pricing/
= 4 nodes * 720 hrs * $0.25
= $720
dw2.large; Oregon;
On-Demand
To sum up…
Review: AWS Managed Database Services
Amazon DynamoDB
Key-value
Amazon RDS
SQL
Amazon ElastiCache
In-memory cache
Amazon Redshift
Data warehouse
Benefits of AWS Database Services
Pay only for what you use
No up-front cost
Fully managed services
AWS handles installs,
patching, restarts
Easy to scale
Grow as you need
Designed for use with
other AWS services
Amazon
EC2
Amazon
S3
Amazon
CloudWatchAmazon
SNS
Amazon
VPC
AWS
Data Pipeline
Pace of Innovation – a Bonus• Amazon RDS/PostgreSQL, new M3 instances
• Amazon RDS/SQL Server TDE, version upgrade
• Amazon RDS/Oracle TDE, 3TB/30K IOPS, version upgrade
• Cross-region snapshot copy, parallel replica, chained replica
• Multi-AZ SLA, log access, VPC groups, …
Amazon RDS
team launched
25+ features
• ElastiCache: Redis 2.8.6 engine support
• DynamoDB cross-region import/export, fine grained access
control, global secondary indexes
• DynamoDB local, geospatial indexing library
NoSQL team
launched 12+
features
• Amazon Redshift dense compute nodes
• Encryption with HSM support
• Audit logging, Amazon SNS notification, snapshot sharing
• Cross-region Amazon Redshift automatic backups
• Faster resize, improved concurrency, distributed tables…
Amazon
Redshift team
launched 21+
features
AWS Marketplace
• Find software to use with Amazon RDS, Amazon Redshift, DynamoDB, and ElastiCache
• One-click deployments
• Flexible pricing options
•
http://aws.amazon.com/marketplace
Try AWS database services for freeService Free to new customers every month
for one year
DynamoDB 100 MB of storage
5 units of write capacity
10 units of read capacity
Amazon RDS 750 Micro DB instance hours
20 GB of DB Storage
20 GB for Backups
10 million I/O operations
ElastiCache 750 Micro Cache Node instance hours
Thank you