a journey to dynamodb -...

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

A Journey to DynamoDBand maybe away from DynamoDB

Adam DockterVP of Engineering

ServiceTarget


Who are we?

Small Company 4 Developers

AWS Infrastructure

NO QA!!


About our product

Self service web application powered by a knowledge base


Journey Phase 1 - Create a new application

● One Developer

● Get to market fast

● Plan to evolve the product

● Plan to grow product and team

Phase 1



● Building a knowledge base application

● Need document storage

● Minimal configuration storage

● No strong relational needs

Phase 1Define our

data model



● NoSQL was all the rage

● Schemaless seemed to support ideas for evolving the product

● JSON is fast flexible and ubiquitous across languages

Phase 1Define our

data modelNO SQL


What is No SQL?

Table

Items

Attributes

{

Partition

Key

{Sort

Key


What is No SQL?

Key/Value Storage

{

id: `6aZtc79`,

type: àrticle`,

title: `Dynamo DB Overview`,

body: Àmazon DynamoDB is a

fully managed NoSQL database

service that provides fast and

predictable performance with

seamless scalability`,

modfied: `20180919`

}

{

id: `8xE220rt`,

type: `contact`,

name: Àdam Dockter`,

email: `[email protected]`,

phone: `406-555-9555`,

dob: `19800221`,

modfied: `20180919`

}

SORT

KEY KEY

SORT

CONTENT TABLE CONTENT TABLE


What is No SQL?

SQL NoSQL

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale



Phase 1Define our

data modelNO SQL

We did it!



Phase 1Define our

data modelNO SQL MongoDB


Journey

Phase 2

Phase 2 - Scale and Grow

Dev Ops

Challenges


Journey

Phase 2Dev Ops

Challenges


Running your own DB

❏ Replication in primary region

❏ Replication to failover region

❏ Replication across AWS AZs

❏ Security of DB

❏ Upgrades and patches of DB

❏ Backups of DB

❏ Security of host machine

❏ Upgrades and patches of host machine

❏ Health checks and alarms

❏ Enough capacity / Performance

❏ Stuff I can’t even think of...

Running DynamoDB

❏ Replication in primary region

❏ Replication to failover region

❏ Replication across AWS AZs

❏ Security of DB

❏ Upgrades and patches of DB

❏ Backups of DB

❏ Security of host machine

❏ Upgrades and patches of host machine

❏ Health checks and alarms

❏ Enough capacity / Performance

❏ Stuff I can’t even think of...


Journey

Phase 2Dev Ops

ChallengesDynamoDB



Dynamo Development

Devtools / Discovery /

Troubleshooting

Downloadable /

Offline

Environment


Dynamo Development

Mapper Available in Java and .NET - Super easy but limitations

Document Available in Java, .NET and Javascript

Low Level Basically working directly against REST API

Programming API


Dynamo DevelopmentApplication

Create a mapper

var client = AmazonDynamoDBClientBuilder

.standard().build();

var mapper = new DynamoDBMapper(client);

Create and item

var item = new Catalog( 102, "Book Title" );

mapper.save(item);

Get item by id

var item = mapper.load(Catalog.class, 102);

Create a client

import {DynamoDB} from 'aws-sdk';

let client = new DynamoDB.DocumentClient(config);

Query item

let params: DynamoDB.DocumentClient.QueryInput = {

TableName: 'catalog',

KeyConditionExpression: 'id = :id',

ExpressionAttributeValues: {

':id': itemId

}

};

let queryResult = await

client.document.query(params).promise();

Java API Javascript Migration tool


Journey

Phase 3

Phase 3 - Expand features

Dynamo

Challenges


Journey

Phase 3


Dynamo

Challenges

● Capacity / Cost

● Backups

● Replication to failover

● Indexing data

● Query capabilities


Dynamo ChallengesCapacity and Cost

Tables are provisioned with and billed by read and

write capacity units

● RCU - 4Kb / sec

● WCU - 1Kb / sec

● Forecasting is incredibly hard when your

customer data, volumes, growth, etc… are not

exact or known

● When limits are hit your app is down - get

ready for 5xx responses

● Auto scale with lambda and CLI calls

Single digit response times are counter intuitive

DynamoDB team introduces auto-scaling

● It’s not great, especially for spikes

DynamoDB team introduces cache accelerators

(DAX)

● Solve issue with burst reads on a table

● Can only cache data in the storage schema

● We don’t use this we use our own caching

strategy with Elasticache which allows us to

cache higher level processed data schemas

When we started What’s new since then


Dynamo Challenges

Dynamo backups were not available and they were a

nightmare

Solution 1

● Lambda function that ran nightly to read dynamo

data and write to S3

Solution 2

● Lambda function that ran nightly

● Scale up RCU, Read data, Retry with exponential

back off when capacity limit hit, Write to S3, Scale

down RCU

Solution 3

● Step functions, lambda functions and retries….oh

my!

DynamoDB team introduced built in backups

● Instant

● API

● Restore deleted tables

● Infinite backups

● Don’t pay for it

Simple solution now is a lambda function that uses

backup API

● <2 hours for new backup solution vs. 4-6 person

days

● Point in time backups provided by Dynamo


Backups


Dynamo Challenges

Incredibly difficult for us to move data our primary

region to our failover region

Admittedly we may be somewhat at fault here

How we replicated

● Describe all tables and recreate

● Read data from primary and copy to failover

● Constantly battling RCU/WCU limits

● Constantly battling Lambda timeouts

Similar solution to backups

● Step functions, lambda functions and retries….oh

my!

Data streams

● Data streams and continuous data replication

with write triggers

● Was likely there when we started we just didn’t

consider it

● Seemed expensive, but probably cheaper than

entire data replication nightly.

● 3TB month data transfer cost and growing

DynamoDB team introduced built in global tables

● Replicate tables across regions and keep data in

sync for you

● Simple solution now will be global table +

lambda function for backups


Replication to Failover


Journey

Phase 2


Dynamo

Challenges

Uptake

Capabilities

now

Autoscaling

Built in backups

Dynamo Accelerator DAX

Data streams

Global Tables

complete

complete

no need

consider

consider


Journey

Phase 2


Dynamo

Challenges

Uptake

Capabilities

Better

Patterns

next

● Capacity / Cost

● Backups

● Replication to failover

● Indexing data

● Query capabilities


Misconceptions

Flexibility is commonly touted as the reason to

use NoSQL databases.

Reality is NoSQL schema is tightly coupled to

application specific access patterns which is an

inherently inflexible design choice as it restricts

what can be done in the future without

restructuring the data.

RDBMS leverages normalized schema coupled

with an ad hoc query engine to provide arguably

more flexibility and less developer overhead in

design and build phases of a project. ...

Rick Houlihan


What is No SQL?

SQL NoSQL

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale


Dynamo Challenges

Querying is limited

● Only by partition key

● Data is sorted by a single sort key

Scanning is expensive

● Scanning selected all data then filters

Secondary Indexes

● Local Secondary Index

○ Same partition key, different sort key.

○ Limit of 5

○ Has to be created when the table is created

● Global Secondary Index

○ Different partition key and sort key

○ Limit of 5

○ Can be created at any time

○ Pay for RCU/WCU on these indexes

Indexing


Dynamo Challenges

Paging - limited support

● Paging is done through a LastEvaluatedKey

● Limited to a single sort key

● Jump to page X not possible

● Go back to page Y not easy

Aggregations - no support

Joins - no support

Transactions - limited support

● An operation on a table are atomic

● Operations across tables are not

Querying capabilities


Pattern Improvements

Dynamodb Streams

● Change log of data

● All write operation show up on a stream

● Can be hooked up to lambda

Lambda

● Stored procedure engine

● Complete isolated process space

● Cannot crush performance of db

** Storage is cheap, computation is more

expensive

Generated Attributes

● Concatenate two or more attributes

together

Better use of index as instiated views

● Generate attributes for scenarios

● Filtered data sets

Transactions

● Versioning pattern

● Metadata records

● Conditional Updates - optimistic locking

Adjacency list and materialized graphs!

Building blocks Patterns


Stay or go?

● Continue to learn better practices

● Leverage more integrated capabilities of AWS

● Keep uptaking new enhancements as they come out

● Microservices - each service can decide


Questions?


References

Rick Houlihan: Pattern Presentation

● https://www.youtube.com/watch?time_continue=331&v=jzeKPKpucS0

● https://www.slideshare.net/AmazonWebServices/advanced-design-patterns-

for-amazon-dynamodb-dat403-reinvent-2017

https://www.youtube.com/watch?time_continue=331&v=jzeKPKpucS0

https://www.slideshare.net/AmazonWebServices/advanced-design-patterns-for-amazon-dynamodb-dat403-reinvent-2017

a journey to dynamodb -...

Documents