serverless realtime backup

43
Serverless Realtime Backup and Restore of DynamoDB with AWS Lambda Ian Meyers, Principal Solution Architect Amazon Web Services EMEA

Upload: amazon-web-services

Post on 12-Apr-2017

536 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Serverless Realtime Backup

Serverless Realtime Backup and Restore of

DynamoDB with AWS Lambda

Ian Meyers, Principal Solution Architect

Amazon Web Services EMEA

Page 2: Serverless Realtime Backup

Not your

regular

Serverless

Talk

!Slackbots

!Pizza

!Alexa

!Connected Home

!Driverless Cars

!Smart Cities

Page 3: Serverless Realtime Backup

!

Page 4: Serverless Realtime Backup
Page 5: Serverless Realtime Backup

AWS

Lambda

Logic

Amazon

DynamoDB

Amazon

S3

State

Amazon API

Gateway

Amazon

Kinesis

Amazon

SNS

Message Passing

Computational Primitives

Page 6: Serverless Realtime Backup

Serverless Compute

AWS LambdaFully Managed Event Processor (Node.js, Python,

or Java)

Natively Compile & Install Any Type Of

Dependency

Specify Runtime RAM & Timeout

Automatically Scaled to support Event Volume

Integrated CloudWatch Logging

REST Interface with API Gateway

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 7: Serverless Realtime Backup

Amazon KinesisManaged Service for Real Time Big Data Processing

Create Streams to Produce & Consume Data

Elastically Add and Remove Shards for Performance

Use Kinesis Worker Library to Process Data

Integration with S3, Redshift and Dynamo DB

Serverless Messaging

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 8: Serverless Realtime Backup

• Zero administration: Capture and deliver streaming data into S3, Redshift, and other

destinations without writing an application or managing infrastructure.

• Direct-to-data store integration: Batch, compress, and encrypt streaming data for

delivery into data destinations in as little as 60 secs using simple configurations.

• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention

Capture and submit

streaming data to

Firehose

Firehose loads streaming data

continuously into S3 and Redshift

Analyze streaming data using your favorite

BI tools

Zero administration: Capture and deliver streaming data into S3, Redshift, and

other destinations without writing an application or managing infrastructure.

Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations.

Seamless elasticity: Seamlessly scales to match data throughput w/o intervention

Serverless Stream Archive

Page 9: Serverless Realtime Backup

Serverless Database

DynamoDBProvisioned throughput NoSQL database

Fast, predictable, configurable performance

Fully distributed, fault tolerant HA architecture

Update Streams provide DB Notifications

Integration with EMR & Hive

RDS DynamoDB

Redshift ElastiCache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 10: Serverless Realtime Backup

DynamoDB Internals - Partitions

0000

NNNN

Hash R

ange

* 1KB Write * * 4KB Read

1000 Write IOPS* 3000 Read IOPS**

DynamoDB

Table

Page 11: Serverless Realtime Backup

Partitions are three-way replicated

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Facility 1

Facility 2

Facility 3

Partition 1 Partition 2 Partition N

Page 12: Serverless Realtime Backup

DynamoDB Internals - Streams

0000

NNNN

Ha

sh R

ang

e

DynamoDB

Table

NNNN

MMMM

Ha

sh R

ang

e

Update Stream

Update Stream

2MB/sec

2MB/sec

INSERT

UPDATE

DELETE

Page 13: Serverless Realtime Backup

What about backups?

Page 14: Serverless Realtime Backup

DynamoDB is multi-AZ durable, always…

Why do I need backups?

Page 15: Serverless Realtime Backup

Human.

Error.

Page 16: Serverless Realtime Backup

Application.

Error.

Page 17: Serverless Realtime Backup

Serverless Full Backups of DynamoDB

Page 18: Serverless Realtime Backup

Input Datanode: This could be a S3 bucket, RDS

table, EMR Hive table, etc.

Activity: This is a data aggregation,

manipulation, or copy that runs on a user-

configured schedule.

Output Datanode: This supports all the same

datasources as the input datanode, but they

don’t have to be the same type.

Serverless Orchestration

Data PipelineAutomatically Provision EC2 & EMR Resources

Manage Dependencies & Scheduling

Automatically Retry and Notify of Success &

Failure

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 19: Serverless Realtime Backup

Elastic MapReduceManaged, elastic Yarn (1.x & 2.x) cluster

Integrates with S3, DynamoDB and Redshift

Install Spark, Presto, Impala, Hive, Pig, Impala &

End User Tools Automatically

Integrated HBase NOSQL Database

Support for Spot Instances

Support for Transparent HDFS Encryption

Big Data Analytics

Elastic

MapReduce

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 20: Serverless Realtime Backup

2 Important Concepts

RTO: How long will it take to get data back

RPO: When I do restore, how much data can I lose?

Page 21: Serverless Realtime Backup

Serverless Full Backups of DynamoDB

RPO: How often do

you do a full backup?

RTO: How long does

an import take?

Page 22: Serverless Realtime Backup

Serverless Stream Replication

http://bit.ly/2eNimjv

LambdaStreamsToFirehose

Page 23: Serverless Realtime Backup

meh, that’s easy

• Supports AWS Lambda and DynamoDB Update

Streams

• User Defined transformers

• Kinesis Producer Library Deaggregation from

Protocol Buffers

• Deterministic ordering to destination

• User defined routing rules (coming soon)

Page 24: Serverless Realtime Backup

Serverless Continuous Backup

Page 25: Serverless Realtime Backup

so we’re done…?

Page 26: Serverless Realtime Backup

we need to ensure that

we can’t do the wrong

thing...

Page 27: Serverless Realtime Backup

DynamoDB Update Stream

Kinesis Firehose Delivery Stream

AWS Lambda Streams to Firehose

DDB⇒Lambda Event Source

all mandatory configurations

Page 28: Serverless Realtime Backup

Policies, or technology?

Page 29: Serverless Realtime Backup

Serverless Account Audit Trails

Page 30: Serverless Realtime Backup
Page 31: Serverless Realtime Backup
Page 32: Serverless Realtime Backup
Page 33: Serverless Realtime Backup

Backup

Provisioning

Architecture

http://bit.ly/2dWqNMO

Page 34: Serverless Realtime Backup
Page 35: Serverless Realtime Backup

OK – backup covered. Now we’re done…?

Page 36: Serverless Realtime Backup

Components

of a Backup

System

Periodic full backups

Incremental change capture

Ability to restore data

Page 37: Serverless Realtime Backup

Restore

{"Keys":{"MyHashKey":{"S":"abc"}},"NewImage":{"123":{"S":"asdfasdf

{"Keys":{"MyHashKey":{"S":"abc"}},"NewImage":{"123":{"S":"asdfasq223qdf"},"

Page 38: Serverless Realtime Backup

Restoreadd jar s3://mybucket/prefix/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar

create external table MyTable_<YYYY><MM><DD>_<HH>( Keys map<string,map<string,string>>, NewImage map<string,map<string,string>>, OldImage map<string,map<string,string>>, SequenceNumber string, SizeBytes bigint, eventName string)ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe’location 's3://backup-bucket/backup-prefix/MyTable/<YYYY>/<MM>/<DD>/<HH>';

select OldImage['attribute1']['s'], NewImage['attribute1']['s'], SequenceNumber, SizeBytes, EventNamefrom MyTable_<YYYY><MM><DD>_<HH> where Keys['MyHashKey']['s'] = <some hash key value of the item> order by SequenceNumber desc;

Page 39: Serverless Realtime Backup

Restore

Page 40: Serverless Realtime Backup

Now comes the hard part…

Page 41: Serverless Realtime Backup

Serverless isn't just Lambda…

...it’s also...

…streaming replication

…streaming data archival

…audit logging and stream production

…long term, high durability storage

…orchestration of full backups

Page 42: Serverless Realtime Backup
Page 43: Serverless Realtime Backup

Serverless Conf London 2016