cloud patterns applied

51

Upload: lars-fronius

Post on 05-Aug-2015

114 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Cloud patterns applied
Page 2: Cloud patterns applied

Cloud patterns applied

Making the most of EC2 at EyeEm

Page 3: Cloud patterns applied

• Site Reliability Engineer at EyeEm

• How do computers even work?

• Started as an operations guy in a scientific datacenter

• Now mostly developing and making users and developers happy

ME

@LarsFronius [email protected]

Page 4: Cloud patterns applied

Resilience, Development, Culture.

Page 5: Cloud patterns applied

—Paul Hammond

“If you think you can prevent failure, then you aren’t developing your ability to respond.”

Page 6: Cloud patterns applied
Page 7: Cloud patterns applied
Page 8: Cloud patterns applied
Page 9: Cloud patterns applied

• Have as few as possible machines containing application state

• Test restores of stateful machines

Page 10: Cloud patterns applied

• Have as few as possible machines containing application state

• Test restores of stateful machines

• …all the time.

Page 11: Cloud patterns applied

• Have as few as possible machines containing application state

• Test restores of stateful machines

• …all the time.

Page 12: Cloud patterns applied

• Throw away stateless servers

Page 13: Cloud patterns applied

• Throw away stateless servers

• Make sure they can come up again towards their expected behaviour

Page 14: Cloud patterns applied

—John Allspaw, Richard Cook

“The goal of operations is to have every day be just another boring day. Achieving this boredom depends on foreseeing the future performance

of the system and making adjustments accordingly.”

Page 15: Cloud patterns applied
Page 16: Cloud patterns applied

• Distributed datastores

• Many small servers, rather than few big

Page 17: Cloud patterns applied

33% 33% 33%

Page 18: Cloud patterns applied

33% 33% 33%

Page 19: Cloud patterns applied

50% 50% 33%

50% traffic increase on a single instance

Page 20: Cloud patterns applied

12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

Page 21: Cloud patterns applied

12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%

Page 22: Cloud patterns applied

14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%

Page 23: Cloud patterns applied

14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%

14.4% traffic increase on a single instance

Page 24: Cloud patterns applied
Page 25: Cloud patterns applied
Page 26: Cloud patterns applied

• Mark endpoints dead

• Test timeouts

Page 27: Cloud patterns applied

• Single responsibility servers / services

• Security Groups control the interface how services are supposed to talk to another

Page 28: Cloud patterns applied

• Single responsibility servers / services

• Security Groups control the interface how services are supposed to talk to another

• …and can be used to assign server role.

Page 29: Cloud patterns applied

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379

Page 30: Cloud patterns applied

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379

Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

Page 31: Cloud patterns applied
Page 32: Cloud patterns applied

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

Page 33: Cloud patterns applied

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Database Security Group role=database

Redis Security Group

role=feeds

Allow Inbound Backend 3306

Allow Inbound Backend 6379 Base Security Group

Page 34: Cloud patterns applied

{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }! }! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "restore_from_extract",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "db1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "db1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "db1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "db1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "dbprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "extract-access"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "dbsg": {! "Properties": {! "GroupDescription": "db",! "SecurityGroupIngress": [! {! "FromPort": "3306",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "3306"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "db"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "puppetprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "puppet-provisioning"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "redis1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "redissg"! },! "puppeteers"! ],! "Tags": [],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "redis1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "redis1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "redis1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "redis1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "redissg": {! "Properties": {! "GroupDescription": "redis",! "SecurityGroupIngress": [! {! "FromPort": "6379",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "6379"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "redis"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! }! }!}

Page 35: Cloud patterns applied

{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }!

}! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "base"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]!

}! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },!

Page 36: Cloud patterns applied

–json.org

“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy

for humans to read and write. It is easy for machines to parse and generate.”

Page 37: Cloud patterns applied

–json.org

“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy

for humans to read and write. It is easy for machines to parse and generate.”

Page 38: Cloud patterns applied

eyeemstack create --machines backend db feed --restore_db extract --branch feature_x

• Python tool on top of troposphere, a python library to create CloudFormation descriptions

Page 39: Cloud patterns applied

vagrant up backend db feed

Page 40: Cloud patterns applied

class eyeem::profiles::backend::deploy {! eyeem::deploy_codebase { “backend”:! directory => ‘/var/www/backend’,! bucket => ‘eyeem-web-backend’,! filename => “backend-${::branch}.tar.gz”,! restart => [‘nginx’, ’php5-fpm’]! }!}!!!!define eyeem::deploy_codebase (! $prefix = '',! $directory,! $bucket,! $filename,! $restart ) {!! if (member($::mountpoints, “${directory}/current”) and $::environment == ‘local’) {! notice(“Looks like we are on Vagrant and you mounted the code in, skipping deploy.”)! } else {! ( . . . )! }!}

Page 41: Cloud patterns applied
Page 42: Cloud patterns applied

• ~70 Cents for a single test run.

• ~3.50 $ per workday.

• ~17.64 $ for always on staging per day.

• Tests disaster recovery on a sample dataset.

• Scalable setup.

• < 10 minutes

Page 43: Cloud patterns applied

• Stagings just a click away.

Page 44: Cloud patterns applied

Backend Security Group role=backend

Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Base Security Group

Inventory Service Security

Group role=inventory

InstanceInstance

branch=feature_x

public_dns=api.feature_x.

eyeem.combranch=master

public_dns=api.eyeem.com

Page 45: Cloud patterns applied

Backend Security Group role=backend

Base Security Group

Metrics Security Group

role=metrics

Allow Inbound Base 8125

production branch=master

feature_x staging branch=feature_x

Backend Security Group role=backend

Base Security Group

Inventory Service Security

Group role=inventory

InstanceInstance

branch=feature_x

public_dns=api.feature_x.

eyeem.combranch=master

public_dns=api.eyeem.com

Jobrunner Service Security

Group role=jobrunner

Page 46: Cloud patterns applied
Page 47: Cloud patterns applied

Culture

Practices

Tools

Page 48: Cloud patterns applied

• ~350 Job Executions last month

• 350 times self service operations

• Stagings everywhere

• Definition of Done: Can you boot it up using EyeEmStack and Vagrant?

• Lots of 99.999s%

Page 49: Cloud patterns applied

• “Everything fails all the time.”

• Test your repairs, automate everything.

• Distribute your data.

• Applications should be able to handle state transitions of service-parts and diagnose failure.

• Design your infrastructure towards acting as a service provider to your developers.

Page 50: Cloud patterns applied

Questions?

Page 51: Cloud patterns applied

Questions?