serverless realtime backup

Post on 12-Apr-2017

537 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Serverless Realtime Backup and Restore of

DynamoDB with AWS Lambda

Ian Meyers, Principal Solution Architect

Amazon Web Services EMEA

Not your

regular

Serverless

Talk

!Slackbots

!Pizza

!Alexa

!Connected Home

!Driverless Cars

!Smart Cities

!

AWS

Lambda

Logic

Amazon

DynamoDB

Amazon

S3

State

Amazon API

Gateway

Amazon

Kinesis

Amazon

SNS

Message Passing

Computational Primitives

Serverless Compute

AWS LambdaFully Managed Event Processor (Node.js, Python,

or Java)

Natively Compile & Install Any Type Of

Dependency

Specify Runtime RAM & Timeout

Automatically Scaled to support Event Volume

Integrated CloudWatch Logging

REST Interface with API Gateway

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Amazon KinesisManaged Service for Real Time Big Data Processing

Create Streams to Produce & Consume Data

Elastically Add and Remove Shards for Performance

Use Kinesis Worker Library to Process Data

Integration with S3, Redshift and Dynamo DB

Serverless Messaging

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

• Zero administration: Capture and deliver streaming data into S3, Redshift, and other

destinations without writing an application or managing infrastructure.

• Direct-to-data store integration: Batch, compress, and encrypt streaming data for

delivery into data destinations in as little as 60 secs using simple configurations.

• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention

Capture and submit

streaming data to

Firehose

Firehose loads streaming data

continuously into S3 and Redshift

Analyze streaming data using your favorite

BI tools

Zero administration: Capture and deliver streaming data into S3, Redshift, and

other destinations without writing an application or managing infrastructure.

Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations.

Seamless elasticity: Seamlessly scales to match data throughput w/o intervention

Serverless Stream Archive

Serverless Database

DynamoDBProvisioned throughput NoSQL database

Fast, predictable, configurable performance

Fully distributed, fault tolerant HA architecture

Update Streams provide DB Notifications

Integration with EMR & Hive

RDS DynamoDB

Redshift ElastiCache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

DynamoDB Internals - Partitions

0000

NNNN

Hash R

ange

* 1KB Write * * 4KB Read

1000 Write IOPS* 3000 Read IOPS**

DynamoDB

Table

Partitions are three-way replicated

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Facility 1

Facility 2

Facility 3

Partition 1 Partition 2 Partition N

DynamoDB Internals - Streams

0000

NNNN

Ha

sh R

ang

e

DynamoDB

Table

NNNN

MMMM

Ha

sh R

ang

e

Update Stream

Update Stream

2MB/sec

2MB/sec

INSERT

UPDATE

DELETE

What about backups?

DynamoDB is multi-AZ durable, always…

Why do I need backups?

Human.

Error.

Application.

Error.

Serverless Full Backups of DynamoDB

Input Datanode: This could be a S3 bucket, RDS

table, EMR Hive table, etc.

Activity: This is a data aggregation,

manipulation, or copy that runs on a user-

configured schedule.

Output Datanode: This supports all the same

datasources as the input datanode, but they

don’t have to be the same type.

Serverless Orchestration

Data PipelineAutomatically Provision EC2 & EMR Resources

Manage Dependencies & Scheduling

Automatically Retry and Notify of Success &

Failure

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Elastic MapReduceManaged, elastic Yarn (1.x & 2.x) cluster

Integrates with S3, DynamoDB and Redshift

Install Spark, Presto, Impala, Hive, Pig, Impala &

End User Tools Automatically

Integrated HBase NOSQL Database

Support for Spot Instances

Support for Transparent HDFS Encryption

Big Data Analytics

Elastic

MapReduce

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

2 Important Concepts

RTO: How long will it take to get data back

RPO: When I do restore, how much data can I lose?

Serverless Full Backups of DynamoDB

RPO: How often do

you do a full backup?

RTO: How long does

an import take?

Serverless Stream Replication

http://bit.ly/2eNimjv

LambdaStreamsToFirehose

meh, that’s easy

• Supports AWS Lambda and DynamoDB Update

Streams

• User Defined transformers

• Kinesis Producer Library Deaggregation from

Protocol Buffers

• Deterministic ordering to destination

• User defined routing rules (coming soon)

Serverless Continuous Backup

so we’re done…?

we need to ensure that

we can’t do the wrong

thing...

DynamoDB Update Stream

Kinesis Firehose Delivery Stream

AWS Lambda Streams to Firehose

DDB⇒Lambda Event Source

all mandatory configurations

Policies, or technology?

Serverless Account Audit Trails

Backup

Provisioning

Architecture

http://bit.ly/2dWqNMO

OK – backup covered. Now we’re done…?

Components

of a Backup

System

Periodic full backups

Incremental change capture

Ability to restore data

Restore

{"Keys":{"MyHashKey":{"S":"abc"}},"NewImage":{"123":{"S":"asdfasdf

{"Keys":{"MyHashKey":{"S":"abc"}},"NewImage":{"123":{"S":"asdfasq223qdf"},"

Restoreadd jar s3://mybucket/prefix/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar

create external table MyTable_<YYYY><MM><DD>_<HH>( Keys map<string,map<string,string>>, NewImage map<string,map<string,string>>, OldImage map<string,map<string,string>>, SequenceNumber string, SizeBytes bigint, eventName string)ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe’location 's3://backup-bucket/backup-prefix/MyTable/<YYYY>/<MM>/<DD>/<HH>';

select OldImage['attribute1']['s'], NewImage['attribute1']['s'], SequenceNumber, SizeBytes, EventNamefrom MyTable_<YYYY><MM><DD>_<HH> where Keys['MyHashKey']['s'] = <some hash key value of the item> order by SequenceNumber desc;

Restore

Now comes the hard part…

Serverless isn't just Lambda…

...it’s also...

…streaming replication

…streaming data archival

…audit logging and stream production

…long term, high durability storage

…orchestration of full backups

Serverless Conf London 2016

top related