avoiding #cloudfail: learning lessons from microsoft...

68

Upload: others

Post on 24-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 2: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 3: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 4: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 5: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 6: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

…Web Role

Cache

Worker

Roles

…AzureBlob

WCFClient

WCFClient

WCFClient

WCFServer

WCFServer

WCFServer

Page 7: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

…Web Role

Cache

Worker

Roles

…AzureBlob

WCFClient

WCFClient

WCFClient

WCFServer

WCFServer

WCFServer

Cache Cache Cache

Page 8: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 9: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 10: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

…Web Role

…Azure

Blob

Azure DB Results Main

WorkerRole

Page 11: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

…Web Role

…Azure

Blob

Azure DB Results Main

WorkerRole

…Web Role

…Azure

Blob

Azure DB Results Main

WorkerRole

Azure Traffic Manager

Page 12: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Expected Load

Scenarios Expected Page Views

Time Window

(hrs) Page View/sec

10X/pvs

DB Calls/sec

Average 10,000,000 4 694 6,944

Peak Hour 6,000,000 1 1,667 16,667

Page 13: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

1 2 n…Web Role

…Azure

Blob

Azure DB

Results MainWorkerRole

CacheWorkerRole

Page 14: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 15: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Actual: if direct SQL DB Calls (~10X)

Time Actual Page Views Time Window (sec) Page View/sec

Possible 10X

DB Calls/sec

Est Capacity

DIFF DB Calls/sec

8pm+10 secs 448,932 10 44,893 448,932 (447,932)

8pm+30 secs 206,925 20 10,346 103,463 (102,463)

8:01 pm 171,231 30 5,708 57,077 (56,077)

8:03 pm 378,350 120 3,153 31,529 (30,529)

8:10 pm 494,423 420 1,177 11,772 (10,772)

8:30 pm 416,379 1200 347 3,470 (2,470)

Actual: using Azure Cache (10X)

Time Actual Page Views Time Window (sec) Page View/sec

Actual 10X

Cache

Calls/sec

Est Capacity

DIFF Cache Calls/sec

8pm+10 secs 448,932 10 44,893 448,932 (288,932)

8pm+30 secs 206,925 20 10,346 103,463 56,538

8:01 pm 171,231 30 5,708 57,077 102,923

8:03 pm 378,350 120 3,153 31,529 128,471

8:10 pm 494,423 420 1,177 11,772 148,228

8:30 pm 416,379 1200 347 3,470 156,530

Page 16: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 17: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 18: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 19: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

1 2 n…Web Role

IaaS

SQL

ApplicationWorkerRole

200 User Test

On-Prem App Azure

Total Counts For Test 1 (batch): 33,174 10,135

Total Counts For Test 2 (interactive): 114,497 39,341

Page 20: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Data.VHD Tempdb.VHD Data+Tempdb.VHD

Page 21: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 22: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 23: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

…Azure

Blob…

IngestionWorkerRole

Azure DB Results

Page 24: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 25: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

…Azure

Blob…

IngestionWorkerRole

Azure DB

Today

Today-1

Today-6

Today-7X

Aggregate

Page 26: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 27: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

AdminWeb Role

User ServicesWeb Role

…AzureBlob

DB

Message Processor WorkerRole

NotificationsWorkerRole

Dedicated Cache WorkerRole

OperationsService Bus Queue

Page 28: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

MessageProcessor

Message2

Message1

Message3

MessageProcessor

Message2

Message1

Message3

Page 29: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

http://msdn.microsoft.com/en-us/library/azure/hh528527.aspx

MessageProcessor

Message2

Message1

Message3

Processed Message

1

Processed Message

3

Processed Message

2

MessageProcessor

Message2

Message1

Message3

Processed Message

1

Processed Message

2

Processed Message

3

Page 30: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 31: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 32: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

HVAC

Thermostat

Thermostat

Bu

ildin

g R

ou

ter

Web Role

(Synchronous HTTP)

Web Role

Mobile User Interface

Azure

On-Prem

Gateway

Weather WCF

Service

Azure Queue Email Queue

SMTP Relay

Dealer Email

Page 33: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

12 A3 VMs

Page 34: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 35: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

HVAC

Thermostat

Thermostat

Bu

ildin

g R

ou

ter

Web Role

(Synchronous HTTP)

Web Role

Mobile User Interface

Service Bus

Email Queue

Azure

On-Prem

Gateway

SMTP RelayWeather WCF

Service

Async and Writes

Sync and

Reads

Worker Role

(write-leveling)

Em

ail R

eq

uest

Page 36: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

4 A3 VMs

Page 37: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 38: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 39: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 40: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 41: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 42: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 43: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Gateway

Service 1

User profile

Service 9

Image processing

ClientSQL DB

Monolithic, Synchronous and MonolingualWeb role (single instance)

……

….

Page 44: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 45: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Gateway

User profile service

Image processing

service

Client

SQL DB

Distributed, Asynchronous and Polyglot

Image

processing

Decompose workloads into different roles

Use queue and worker for CPU intensive task

Worker roleWeb role

Web role Web role

Web role

Other servicesTable

Storage

Cache

Blob

Storage

Optimal storage per each data type

Async calls all around

Cache

Page 46: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 47: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 48: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 49: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 50: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Role A Role B

Port

80

Port

3389

Port

3390

Role A’ Role B’

Port

80

Port

3389

Port

3390

Production VIP – VIP1

<dnsname>.cloudapp.netStaging VIP – VIP2

<guid>.cloudapp.net

Page 51: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Slot VIP Role A Role B

Production 168.133.1.22 Healthy Healthy

Stage 168.124.33.22 Healthy Healthy

Slot VIP Role A Role B

Stage 168.124.33.22 Healthy Healthy

Production 168.133.1.22 Healthy Healthy

Page 52: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

Slot VIP Role A Role B

Production 168.133.1.22 Healthy Healthy

Stage 168.124.33.22 Healthy Healthy

Slot VIP Role A Role B

Stage 168.124.33.22 Healthy Healthy

Production 168.133.1.22 Healthy Healthy

Slot VIP Role A Role B

Stage 168.124.33.22 Healthy Healthy

Stage 168.124.33.22 Healthy Healthy

Page 53: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

RDFE A RDFE vNextRDFE B

Page 54: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

RDFE A RDFE vNextRDFE B

Page 55: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

RDFE vNextRDFE B

Page 56: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

RDFE vNextRDFE vNext

Page 57: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

http://blogs.msdn.com/b/windowsazure/archive/2013/03/01/details-of-the-february-22nd-2013-windows-azure-storage-

disruption.aspx

Page 58: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 59: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 60: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 61: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an

invocation. ---> Microsoft.ServiceModel.Web.WebProtocolException: Server Error: The service

name is unknown (NotFound)

HTTP Status Code: 400. Service Management Error Code: MissingOrIncorrectVersionHeader. Message: The versioning header is not specified or was specified incorrectly.

Page 62: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 63: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 64: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 65: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 66: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 67: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page
Page 68: Avoiding #CloudFail: Learning Lessons from Microsoft Azuredownload.microsoft.com/download/6/1/C/61C0E37C-F252-4B33-955… · Actual: if direct SQL DB Calls (~10X) Time Actual Page