aws as a data platform

46
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS as a Data Platform Joe Healy Sr. Consultant – AWS WWPS Professional Services ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Upload: amazon-web-services

Post on 14-Aug-2015

252 views

Category:

Technology


0 download

TRANSCRIPT

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS as a Data PlatformJoe Healy

Sr. Consultant – AWS WWPS Professional Services

                ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

GB TBPB

ZB

EB

Unconstrained growth: Big data is moving fast

• 2.8 trillion GB in 2012• 40 trillion GB in 2020•  95% unstructured• 70% user-generated

Source: IDC

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

It’s not just about sizedata

velocity

variety

volume

structured, unstructured, text, binary

gigabytes, terabytes, petabytes

millisecond, second, minute, hour, day

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

ease of uselower costs

Why AWS?

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

no capital investment

pay as you go

no subscriptions

pay only for what you use

ease of uselower costs

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

programmable

zero admin easy to configure

integrates with existing tools

ease of uselower costs

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

one tool to rule them all

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

II

Use the right tools

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Movement and coordination

Data PipelineDirect Connect Storage GatewayImport/Export

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Storage and analysis services

EC2EBS

Instance Storage

RedshiftRDS

SQL Stores

EMR

hadoop

DynamoDB

NOSQL

AmazonKinesis

stream

S3

Storage Services

CloudFrontAmazonGlacier

EFS

machine

Amazon Machine Learning

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Movement and coordination

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Movement and coordination: Plumbing

ship us your disks

Direct Connect

Storage Gateway

Import/ Export

dedicated network pipes

storage backup and archiving

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Data Pipeline

resource management

scheduling, execution, and retry

dependency tracking

failure notification

AWS and on-premises

Movement and coordination: Orchestration

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Data storage and analysis

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Storage services: Object store

Amazon S3

Store objects (like “files”)

Objects are stored in buckets

Buckets keep data in a single AWS region,

replicated across multiple facilities

- cross-region replication

highly durable, highly available, highly scalable

- 99.999999999% durability

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Storage services: Archive storagelow-cost, durable archive

“cold storage” tape replacement

infrequently accessed data

integrated S3 lifecycle policies

99.999999999% durability

immutable

Amazon Glacier

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Storage services: Content deliverysimple to use with global footprint (50+ edge

locations)

streaming support

large file distribution

private content

S3, EC2, and ELB integration

geo restrictions

static and dynamic content

Amazon CloudFront

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Instance storage: Options

Ephemeral storage (“local”)you manage backup/restorefree!high storage instances available

i2.8xlarge – 6.4 TB SSD (350K IOPS) d2.8xl, hs1.8xl – 48 TB disk storage

AmazonEC2

Elastic Block Store (EBS)“network attached storage”snapshot, encryptionprovisioned throughput (IOPS)magnetic or SSDup to 16 TB and 20,000 IOPS per volume

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Instance storage: Build your own

AmazonEC2

NFS

MongoDB

Cassandra

GraphLab

Titan

Kafka

Luster

Gluster

Flume

Scribe

Presto

…and more

PUBLIC MATERIAL | TOAN DO

Migrating to AWS: A proven method

Refactor and develop applications for C2S cloud securely with the following requirements: portable across infrastructure, stable platform, and agile.

stable platformenhanced security

elasticstable platformenhanced security agile

PUBLIC MATERIAL | TOAN DO

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Instance storage: Elastic File System

AmazonEFS

fully managed file system for EC2 instances

works with standard operating system APIs

sharable across thousands of instances

elastically grows to petabyte scale

delivers performance for a variety of workloads

highly available and durable

NFS v4–based

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

We focused on changing the game

EFS is simple EFS is elastic EFS is scalable

1 2 3

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

EFS is simple• fully managed

– no hardware, network, file layer– create a scalable file system in

seconds• seamless integration with existing tools and apps

– NFS v4—widespread, open– standard file system semantics– works with standard OS file system

APIs• Simple pricing = simple forecasting

1

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

EFS is elastic

• file systems grow and shrink automatically as you add and remove files

• no need to provision storage capacity or performance

• you pay only for the storage space you use, with no minimum fee

2

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

• file systems can grow to petabyte scale

• throughput and IOPS scale automatically as file systems grow

• consistent low latencies regardless of file system size

• support for thousands of concurrent NFS connections

EFS is scalable3

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

MySQL, Aurora, Oracle, SQL Server,

PostgreSQL

backup/restore, high availability, encryption

push-button scalability

up to 3 TB and 30,000 IOPS

Amazon RDS

SQL stores - Managed relational DB

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

If you host your databases on-premises

power, HVAC, net

rack and stack

server maintenance

OS patches

DB software patches

database backups

scaling

high availability

DB software installs

OS installation

app optimization

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

If you choose a managed DB service

power, HVAC, net

rack and stack

server maintenance

OS patches

DB software patchesdatabase backups

app optimization

high availability

DB software installs

OS installation

scaling

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

relational data warehouse

massively parallel

petabyte scale

fully managed

$1,000/TB/year

Amazon Redshift

SQL stores - Petabyte data warehouse

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NoSQL database

seamless scalability

zero admin

single-digit millisecond latency

Amazon DynamoDB

NoSQL - Dial up capacity

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

flexible tool and framework support

- Hive, Impala, Pig, MapReduce, Presto, Spark

easy to use; fully managed

on-demand and Spot pricing

persistent and transient clusters

deep integration with S3 and other AWS services

Amazon Elastic Map

Reduce

Hadoop - Managed

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

real-time data collection

seamlessly scale to gigabytes/s

low-cost managed service

EMR integration

Streaming at scale

Amazon Kinesis

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Streaming - Amazon Kinesis architecture

Amazon Web Services

AZ AZ AZ

d urable, highly consistent storage replicates dataacross three data centers (Availability Zones)

mi llions ofsources producing100s of terabytes

per hour

fronten d

au thenticationau thorization

o rdered streamof events supportsmultiple readers

inexpensive: $0.028 per million puts

aggregate analysis in Hadoop or data warehouse

machine learning algorithms or sliding window analytics

real-time dashboards and alarms

aggregate andarchive to S3

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

managed machine learning service built for developers

robust technology based on Amazon’s internal systems

create models using your existing data in AWS

deploy models to production in seconds

Machine learning made simple

AmazonMachine Learning

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Smart applications by example

based on what you know about the user:

Will the user use your product?

based on what you know about an order:

Is this order fraudulent?

based on what you know about a news article:

What other articles are interesting?

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Three supported types of predictions

• binary classification– predict the answer to a yes/no question

• multiclass classification– predict the correct category from a list

• regression– predict the value of a numeric variable

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The right tool. At the right time. At the right scale.

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Open data on AWS

…One more related topic

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

What is open data?Open data is data that can be used by anyone for any purpose for free.

Many of our customers, such as Esri, the Weather Company, and the Climate Corporation, rely on quality open data as much as they rely on our computing, storage, and other web services.

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Open data as a platform

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Open data as a platform

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Open data as a platform

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Public datasets on AWSTo enable more innovation, AWS hosts a selection of datasets that anyone can access for free. Data in our public datasets is available for rapid access to our flexible and low-cost computing resources.

earth scienceNASA Earth Exchange

(NEX)

life sciences1000 Genomes project

Internet scienceCommon Crawl corpus

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Thank You.This presentation will be loaded to SlideShare the week following the Symposium.

http://www.slideshare.net/AmazonWebServices

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015