dynamic infrastructure and the cloud

102
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lee Atchison ∙ Senior Director Strategic Architecture New Relic, Inc. Sydney, Australia Dynamic Infrastructure and The Cloud Adventures in Keeping Your Application Running… at Scale leeatchison @leeatchison

Upload: new-relic

Post on 11-Apr-2017

57 views

Category:

Technology


1 download

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Lee Atchison ∙ Senior Director Strategic ArchitectureNew Relic, Inc.

Sydney, Australia

Dynamic Infrastructure and The Cloud Adventures in Keeping Your Application Running…at Scale

leeatchison@leeatchison

Who am I?

Lee Atchison

30 years in industry

5 in New Relic(Architect Lead, Cloud, Service Migration)

7 in Amazon Retail & AWS(Built First AppStore, AWS Elastic Beanstalk)

Who Specialize in:Cloud computingServices & MicroservicesScalability, Availability

leeatchison@leeatchison

Senior Director Strategic Architecture

Does this sound familiar…

You had power most of the time.

Why are you complaining?

I Hope, I Hope, I Hope

the Site Stays Up

9

Keeping Your App Running…At Scale

Availability…

…is more than you think it is.

Does this sound like something you’ve heard recently…

…overheard OPs conversation...

The conversation…

“We were wondering how changing a setting on

our MySQL database might impact our performance…

The conversation…

“We were wondering how changing a setting on

our MySQL database might impact our performance…

… but we were worried that the change may cause our production

database to fail…”

The “scary” overheard conversation…

“… Since we didn’t want to bring down production, we decided to make the

change to our backup (replica) database instead…

UnderConstruction

… but we were worried that the change may cause our production

database to fail…”

The “scary” overheard conversation…

“… Since we didn’t want to bring down production, we decided to make the

change to our backup (replica, hot standby) database instead…

… After all, it wasn’t being used for anything

at the moment.”

UnderConstruction

The “scary” overheard conversation…

Until, of course, the backup was needed…

UnderConstructionX

The “ scary” overheard conversation…

Until, of course, the backup was needed…

This was a true story

UnderConstruction!!!!X

X

Availability can be more subtle, for example…

18Confidential ©2008-15 New Relic, Inc. All rights reserved.  

300ms

1.5s18Confidential ©2008-15 New Relic, Inc. All rights reserved.  

19Confidential ©2008-15 New Relic, Inc. All rights reserved.   19Confidential ©2008-15 New Relic, Inc. All rights reserved.  

.9s

20Confidential ©2008-15 New Relic, Inc. All rights reserved.   20Confidential ©2008-15 New Relic, Inc. All rights reserved.  

21

The Data from Monitoring Your AppDwarfs the Data Inside the App

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

22Confidential ©2008-15 New Relic, Inc. All rights reserved.  

User Experience

Business Outcome

Servers

AppsBig Data Problem

High Expectations

BlameGame Intensity Rises

The problem must be someone else’s fault

Panic

What happened?

Need Data at Every Level

Amazon EC2 Instance

BrowserMobile

Server (Virtual) Hardware

Server OS

Application & Application Microservices

Typical Server / Amazon EC2 Instance• Application & Application

Microservices• Server OS• Hardware (virtual)

Amazon EC2 Instance

BrowserMobile

Server (Virtual) Hardware

Server OS

Application & Application Microservices

Low Level Monitoring

Amazon CloudWatch

AWS CONSOLE

Amazon CloudWatch

Monitors• EC2 instance• Virtualization• Hardware• [CPU / Disk / Networking]

Doesn’t know about:• Server OS• Memory / Filesystem• Processes• Configuration• Application

- Latency- Error rates

Amazon EC2 Instance

BrowserMobile

Server (Virtual) Hardware

Server OS

Application & Application Microservices

DASHBOARDS

Infrastructure / Application Monitoring

New RelicApplicationMonitoring

New Relic Infrastructure

Monitoring

Amazon CloudWatch

AWS CONSOLE

Monitors (Server):• How O.S. is performing• Configuration Changes• Processes• Hardware

Monitors (Application):• App health• App performance• Microservices

Doesn’t know• Virtualization

Amazon EC2 Instance

BrowserMobile

Server (Virtual) Hardware

Server OS

Application & Application Microservices

Full Stack Monitoring

New RelicApplicationMonitoring

New Relic Infrastructure

Monitoring

Amazon CloudWatch

AWS CONSOLE

Integrations

New Relic Monitors

CloudWatch monitors

DASHBOARDS

AWS / CloudWatch• Visibility into virtualization• CPU / Disk / Networking• 14 AWS Services

APM• CPU / Disk / Networking• Memory / Filesystem• Processes- Infrastructure components- Configuration inventory• Application / Microservices:

- Latency- Error rates- App insights

29

Why Measurement Matters

30

Success in Software Analytics

Confidential ©2008-15 New Relic, Inc. All rights reserved.  

Application Performance

Customer Experience

BusinessOutcome

30Confidential ©2008-15 New Relic, Inc. All rights reserved.  

32

Keeping Your App Running…At Scale

Availability…

…is more than you think it is.

Dynamic Cloud…

...make availability happen.

The Cloud Can Help

Better Data Center

Dynamic Environment

How do we use the cloud to accomplish this?

Better Data Center

Better Data Center

Dynamic Environment

Cloud as a “Better Data Center”

Resources are allocated to uses, just like in a data

center

Provisioning process is faster

Lifetime of components is relatively long

Capacity planning is still important and

still applies

Why use a “Better Data Center”?

Add new Capacity(faster)

Improve Application Availability(redundancy)

Compliance

Dynamic Cloud

Better Data Center

Dynamic Environment

Cloud as a “Dynamic Tool for Dynamic Apps”

Use Only the Resources you need

Allocate / de-allocateresources on the fly

Resource allocation is an integral part of your

application architecture

Dynamic Cloud

Resources are: Application in charge:

Allocated Application is aware of and is controlling traditional OPs resources

Consumed De-allocated

Dynamic Usage Example…Docker Container Age

(Count vs. Hours)

1 Hour

200 days 833 days

Dynamic Usage Example…Docker Container Age

(by Minute and Hour)

1,200,00011% under one minute

Container age (minutes)

Dynamic Cloud Technologies

Dynamic Cloud is about scaling and availability

EC2 Auto Scaling

Mobile / IoT Dynamic routing

Load balancing

Queues and notifications

Docker

Dynamic Cloud Enables Better Applications Faster

Traditional Data Center Cloud Data Center Dynamic Cloud

Good Better Best

The way you’ve done things in the past won’t work in the future.

Dynamic Cloud

Server running application/ processes

Process running a command

Function performing a task or operation

EC2 Docker Lambda

Things happen faster because of…

Microcomputing & AWS Lambda

• Highly dynamic

• Incredibly scalable

• No infrastructure to provision

• Massively shared infrastructure

Also known as:• Functions as a Service (FaaS)• Compute as a Service (CaaS)• Serverless

AWS Lambda

S3Bucket

DynamoDB

APIGateway

SQS

RESOURCESSOME

S3Bucket

APIGateway SQS

RESOURCESSOME

• Takes an event from an AWS resource (A Trigger)

AWS Lambda

S3Bucket

DynamoDB

APIGateway

SQS

RESOURCESSOME

S3Bucket

APIGateway SQS

RESOURCESSOME

LambdaScript

LambdaInstances

• Takes an event from an AWS resource (A Trigger)

• Creates an instance to execute

AWS Lambda

S3Bucket

DynamoDB

APIGateway

SQS

RESOURCESSOME

S3Bucket

APIGateway SQS

RESOURCESSOME

LambdaScript

LambdaInstances

• Takes an event from an AWS resource (A Trigger)

• Creates an instance to execute

• Can impact original or different AWS Resource

AWS Lambda

S3Bucket

DynamoDB

APIGateway

SQS

RESOURCESSOME

S3Bucket

APIGateway SQS

RESOURCESSOME

LambdaScript

LambdaInstances

• Takes an event from an AWS resource (A Trigger)

• Creates an instance to execute

• Can impact original or different AWS Resource

• Any number of instances can run at a time

Dynamic Cloud

Dynamic Cloud

Easier Scaling

Faster ChangeFaster Response

Higher Availability

Dynamic Cloud has unique monitoring requirements…

How do I track what the dynamic cloud is doing for me (or to me)?

What is a Dynamic Cloud Application?

• Application & Application MicroservicesResponsible for the parts you care about

• Infrastructure• Allocation/Provisioning• Scaling

Let cloud manage rest

Server OS

Server (Virtual)Hardware

Application & Application

Microservices

Provisioning

Application & Application

Microservices

Application & Application

Microservices

BrowserMobile

Server OS

Server (Virtual)Hardware

Application & Application

Microservices

Provisioning

Application & Application

Microservices

Application & Application

Microservices

BrowserMobile

Monitoring Dynamic Cloud Applications

AWS CONSOLE

CloudWatch

Server OS

Server (Virtual)Hardware

Application & Application

Microservices

Provisioning

Application & Application

Microservices

Application & Application

Microservices

BrowserMobile

AWS InfrastructureApplication Performance

CloudWatch

AWS CONSOLE

New RelicApplicationMonitoring

New Relic Infrastructure

Monitoring

DASHBOARDS

Integrations

Server OS

Server (Virtual)Hardware

Application & Application

Microservices

Provisioning

Application & Application

Microservices

Application & Application

Microservices

BrowserMobile

CloudWatch

AWS CONSOLE

New RelicApplicationMonitoring

New Relic Infrastructure

Monitoring

DASHBOARDS

AWS InfrastructureApplication Performance

New Relic Monitors

CloudWatch & AWS monitors

Integrations

Server OS

Server (Virtual)Hardware

Application & Application

Microservices

Provisioning

Application & Application

Microservices

Application & Application

Microservices

BrowserMobile

How do you monitor this?

?How do you

monitor this?

Where did it go? It was just here!!

The thing you monitored 10 minutes ago…

...doesn’t exist anymore!?

Monitoring the Dynamic Cloud

Monitor the Cloud Components themselves Monitor the lifecycle of the Cloud Components

Very different than monitoring traditional Data Center components

Changing World

Ops

Previous - STATIC World

Changing World

Previous - STATIC World

Ops

Dev

Now - DYNAMIC World

Ops

Changing World

Dev

Now - DYNAMIC World

Ops

• We know:• Change is inevitable

• We must:• Embrace and drive change

• Enabling:• Quicker growth• More reliable growth

62

Keeping Your App Running…At Scale

Dynamic Cloud…

...make availability happen.

Migration…

...how do I get my app to the cloud?

High Expectations

Blame Game Intensity Rises

“The problem must be the cloud’s fault”

Pressure to declare victory in the migration

The Politics of Migration

Show me the new apps!!?

Promised Performance gains?Cost controls?Optimize costs?

Why is it taking so long?

Migration failure…

OpsUse the Cloud

• Move in a controlled way• Learn as you go• Measure everything

Does not have to be painful…

Experiment

Secure the Cloud

Enable Servers, Enable SaaS

Enable Value-Added Services

Enable Unique Services

Mandate Cloud Usage

Progressions in Cloud Adoption…The Controlled Way

Standard stepsmost companies

follow

Experiment

Progressions in Cloud Adoption

Enterprise IT Cloud Adoption Strategy

Experiment

Non-evasive, safe technologies - S3

- Perhaps: CloudFront, SQS, SES Stay away from EC2/Servers Security: Easy as one-offs No “Policies” implemented yet “Just seeing what this is all about”

Progressions in Cloud Adoption

What is this cloud thing?

Experiment

Secure the Cloud

Progressions in Cloud Adoption

Progressions in Cloud Adoption

Enterprise IT Cloud Adoption Strategy

Secure the Cloud

IAM (Credentials)

VPC (Secure network)

AWS Direct Connect (just another data center)

Cloud policies begin to be formed All parts of the company are now involved Critical evolution point

Can we trust the cloud?

Experiment

Secure the Cloud

Enable Servers, Enable SaaS

Progressions in Cloud Adoption

Progressions in Cloud Adoption

Enterprise IT Cloud Adoption Strategy

Enable Servers, Enable SaaS

EC2 - Basic “data center migration”

- Just another server type available… Multiple AZs/Regions - Part of multi-datacenter resiliency strategy Independently: SaaS usage increases - Non-critical or internal uses first

The cloud seems to work pretty well…

Experiment

Secure the Cloud

Enable Servers, Enable SaaS

Enable Value-Added Services

Progressions in Cloud Adoption

Progressions in Cloud Adoption

Enterprise IT Cloud Adoption Strategy

Enable Value-Added Services

Managed Databases - RDS, Aurora Other Managed Services - Elastic Beanstalk, SES, SQS, ElasticSearch

Dynamic Cloud becomes a thing…

Experiment

Secure the Cloud

Enable Servers, Enable SaaS

Enable Value-Added Services

Enable Unique Services

Progressions in Cloud Adoption

Progressions in Cloud Adoption

Enterprise IT Cloud Adoption Strategy

Enable Unique Services

High value, Cloud-specific services - Lambda, Kinesis

- DynamoDB

- SWF, Elastic Transcoder

- Redshift Point of commitment... ...dependent on cloud

Dynamic Cloud is deeply ingrained…

Experiment

Secure the Cloud

Enable Servers, Enable SaaS

Enable Value-Added Services

Enable Unique Services

Mandate Cloud Usage

Progressions in Cloud Adoption

Progressions in Cloud Adoption

Enterprise IT Cloud Adoption Strategy

Mandate Cloud Usage

Cloud as a data center replacement Company is now “all in” with cloud Netflix…

Why do we need our own data centers?

What is the cloud?

Can we trust the cloud?

The cloud works pretty well…

Dynamic Cloud becomes a thing…

Dynamic Cloud is deeply ingrained…

Why do we need our own data centers?

Progressions in Cloud AdoptionThe steps aren’t easy…

Experiment

Secure the Cloud

Enable Servers, Enable SaaS

Enable Value-Added Services

Enable Unique Services

Mandate Cloud Usage

Progressions in Cloud Adoption

Different CompaniesDifferent SpeedDifferent Needs

Cloud Adoption Strategies

Enterprise ITCloud Adoption Strategy

Experiment

Secure the Cloud

Enable Servers, Enable SaaS

Enable Value-Added Services

Enable Unique Services

Mandate Cloud Usage

ApplicationCloud Adoption Strategy

Experiment/Peripherial Usage

Cloud Servers

Managed Components

Unique Components

Application Cloud Committed

ApplicationAdoption

CorporateAdoption Cloud Adoption

Mandate

Committed

Allow Value-Added

Allow SaaS

Allow Servers

Secure

Experiment

Experiment Servers ManagedComponents

UniqueComponents

Committed

CriticalApplications

NewApplications

Non-Critical/Internal

ApplicationsStep #1

Step #2

Step #4

First Steps

ApplicationRe-Writes

Step #3

IAMVPC

Non-IntegralSaaS

EC2

IntegralSaaS

RDSSES

LambdaKinesis

ApplicationAdoption

CorporateAdoption Cloud Adoption

Mandate

Committed

Allow Value-Added

Allow SaaS

Allow Servers

Secure

Experiment

Experiment Servers ManagedComponents

UniqueComponents

Committed

CriticalApplications

NewApplications

Non-Critical/Internal

ApplicationsStep #1

Step #2

Step #4

First Steps

ApplicationRe-Writes

Step #3

S3

AdoptionSweet Spot

First Steps

ApplicationAdoption

CorporateAdoption

Mandate

Committed

Allow Value-Added

Allow SaaS

Allow Servers

Secure

Experiment

Experiment Servers ManagedComponents

UniqueComponents

Committed

Cloud AdoptionCenter of Gravity

IntegralSaaS

RDSSES

LambdaKinesis

AdoptionSweet Spot

ApplicationAdoption

CorporateAdoption

Mandate

Committed

Allow Value-Added

Allow SaaS

Allow Servers

Secure

Experiment

Experiment Servers ManagedComponents

UniqueComponents

Committed

S3

EC2

Cloud AdoptionCenter of GravityIAM

VPC

Non-IntegralSaaS

Migrating to the Cloud…How can I be successful?

Adoption Success Strategies

Understand where your

culture is

Consciously plan your acceptance

Drive your cultural change to your

desired level

Monitor your adoption

Understand your needs

Monitor Your Adoption

Before Migration

Baseline application(servers, databases, caches, applications,

microservices)

Determine your steady state

Monitor Your Adoption

During Migration

Incorporate cloud’s internal monitoring

Continue application monitoring

Understand and solve all deviations from steady state…

The Biggest Role Monitoring Plays In Migration

Performance Post Migration & During Optimization

Pre-migration Feasibility & Benchmarking

Continue Monitoring…

Infrastructure is now out of your

control

Some cloud specific concerns (EC2

instance failures, instance degradation)

Dynamic Technologies Impact Our Applications

Understand application

impact

Ongoing application & infrastructure monitoring is

essential

Monitor Your Adoption

919191919191

Fairfax Media Limited is a leading multi platform media company in Australasia, reaching 10.6 million Australians and 2.9 million New Zealanders.

Media/Entertainment

“Because we monitored our on-premises systems with New Relic before we migrated them to Amazon Web Services, we were able to identify potential issues and fix them during the migration process.”

- Cheesun ChoongHead of Product Platforms

Results Reduced

diagnosis time from hours to

minutes

Migrated to AWS with confidence

Identified underutilized

servers to save money

92

Keeping Your App Running…At Scale

Dynamic Cloud…

...make availability happen.

Migration…

...how do I get my app to the cloud?

Availability…

…is more than you think it is.

Monitor your application and infrastructure

Monitoring just the server

EC2 Instance

Server OS

Server (Virtual)Hardware

Application &Application Microservices

AWS CONSOLE

CloudWatch

Worked when rate of change was low…

Dev

Ops

Dynamic World

Server OS

Server (Virtual)Hardware

Application & Application

Microservices

Provisioning

Application & Application

Microservices

Application & Application

Microservices

BrowserMobile

Full Stack Monitoring

New RelicApplicationMonitoring

New Relic Infrastructure

Monitoring

DASHBOARDS

• Top to bottom monitoring…• Full stack accountability...• Dynamic infrastructure control...

You need:

Digital Fan Experience for Major League Baseball

New Relic empowers our developers to experiment and work fast without compromising on the quality of the MLB fan experience. – Sean Curtis

Senior Vice President of Engineering

Panic

Change is speeding up

Traditional Data Center Cloud Data Center Dynamic Cloud

Dynamic Cloud enables better applications faster.

Good Better Best

The way you’ve done things in the past won’t work in the future.

Server OS

Server (Virtual)Hardware

Application & Application

Microservices

Provisioning

Application & Application

Microservices

Application & Application

Microservices

BrowserMobile

Full Stack Monitoring

New RelicApplicationMonitoring

New Relic Infrastructure

Monitoring

DASHBOARDS

Thank youLee Atchison ∙ Senior Director Strategic Architecture

New Relic

Architecting for ScaleBy: Lee AtchisonPublished by: O’Reilly Mediawww.architectingforscale.com

leeatchison@leeatchison

This document and the information herein (including any information that may be incorporated by reference) is provided for informational purposes only and should not be construed as an offer, commitment, promise or obligation on behalf of New Relic, Inc. (“New Relic”) to sell securities or deliver any product, material, code, functionality, or other feature. Any information provided hereby is proprietary to New Relic and may not be replicated or disclosed without New Relic’s express written permission.

Such information may contain forward-looking statements within the meaning of federal securities laws. Any statement that is not a historical fact or refers to expectations, projections, future plans, objectives, estimates, goals, or other characterizations of future events is a forward-looking statement. These forward-looking statements can often be identified as such because the context of the statement will include words such as “believes,” “anticipates,”, “expects” or words of similar import.

Actual results may differ materially from those expressed in these forward-looking statements, which speak only as of the date hereof, and are subject to change at any time without notice. Existing and prospective investors, customers and other third parties transacting business with New Relic are cautioned not to place undue reliance on this forward-looking information. The achievement or success of the matters covered by such forward-looking statements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantial risks, uncertainties, assumptions, and changes in circumstances that may cause the actual results, performance, or achievements to differ materially from those expressed or implied in any forward-looking statement. Further information on factors that could affect such forward-looking statements is included in the filings we make with the SEC from time to time. Copies of these documents may be obtained by visiting New Relic’s Investor Relations website at http://ir.newrelic.com or the SEC’s website at www.sec.gov.

New Relic assumes no obligation and does not intend to update these forward-looking statements, except as required by law. New Relic makes no warranties, expressed or implied, in this document or otherwise, with respect to the information provided.

Safe Harbor