cloud patterns applied
TRANSCRIPT
Cloud patterns applied
Making the most of EC2 at EyeEm
• Site Reliability Engineer at EyeEm
• How do computers even work?
• Started as an operations guy in a scientific datacenter
• Now mostly developing and making users and developers happy
ME
@LarsFronius [email protected]
Resilience, Development, Culture.
—Paul Hammond
“If you think you can prevent failure, then you aren’t developing your ability to respond.”
• Have as few as possible machines containing application state
• Test restores of stateful machines
• Have as few as possible machines containing application state
• Test restores of stateful machines
• …all the time.
• Have as few as possible machines containing application state
• Test restores of stateful machines
• …all the time.
• Throw away stateless servers
• Throw away stateless servers
• Make sure they can come up again towards their expected behaviour
—John Allspaw, Richard Cook
“The goal of operations is to have every day be just another boring day. Achieving this boredom depends on foreseeing the future performance
of the system and making adjustments accordingly.”
• Distributed datastores
• Many small servers, rather than few big
33% 33% 33%
33% 33% 33%
50% 50% 33%
50% traffic increase on a single instance
12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%
12.5%12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5%
14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%
14.3%14.3% 14.3% 14.3% 14.3% 14.3% 14.3% 12.5%
14.4% traffic increase on a single instance
• Mark endpoints dead
• Test timeouts
• Single responsibility servers / services
• Security Groups control the interface how services are supposed to talk to another
• Single responsibility servers / services
• Security Groups control the interface how services are supposed to talk to another
• …and can be used to assign server role.
Backend Security Group role=backend
Database Security Group role=database
Redis Security Group
role=feeds
Allow Inbound Backend 3306
Allow Inbound Backend 6379
Backend Security Group role=backend
Database Security Group role=database
Redis Security Group
role=feeds
Allow Inbound Backend 3306
Allow Inbound Backend 6379
Base Security Group
Metrics Security Group
role=metrics
Allow Inbound Base 8125
Backend Security Group role=backend
Database Security Group role=database
Redis Security Group
role=feeds
Allow Inbound Backend 3306
Allow Inbound Backend 6379Base Security Group
Metrics Security Group
role=metrics
Allow Inbound Base 8125
production branch=master
Backend Security Group role=backend
Database Security Group role=database
Redis Security Group
role=feeds
Allow Inbound Backend 3306
Allow Inbound Backend 6379Base Security Group
Metrics Security Group
role=metrics
Allow Inbound Base 8125
production branch=master
feature_x staging branch=feature_x
Backend Security Group role=backend
Database Security Group role=database
Redis Security Group
role=feeds
Allow Inbound Backend 3306
Allow Inbound Backend 6379 Base Security Group
{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }! }! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "restore_from_extract",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "db1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "db1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "db1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "db1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "dbprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "extract-access"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "dbsg": {! "Properties": {! "GroupDescription": "db",! "SecurityGroupIngress": [! {! "FromPort": "3306",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "3306"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "db"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "puppetprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "puppet-provisioning"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "redis1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "redissg"! },! "puppeteers"! ],! "Tags": [],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "redis1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "redis1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "redis1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "redis1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "redissg": {! "Properties": {! "GroupDescription": "redis",! "SecurityGroupIngress": [! {! "FromPort": "6379",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "6379"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "redis"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! }! }!}
{! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }!
}! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "base"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]!
}! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },!
–json.org
“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy
for humans to read and write. It is easy for machines to parse and generate.”
–json.org
“JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy
for humans to read and write. It is easy for machines to parse and generate.”
eyeemstack create --machines backend db feed --restore_db extract --branch feature_x
• Python tool on top of troposphere, a python library to create CloudFormation descriptions
vagrant up backend db feed
class eyeem::profiles::backend::deploy {! eyeem::deploy_codebase { “backend”:! directory => ‘/var/www/backend’,! bucket => ‘eyeem-web-backend’,! filename => “backend-${::branch}.tar.gz”,! restart => [‘nginx’, ’php5-fpm’]! }!}!!!!define eyeem::deploy_codebase (! $prefix = '',! $directory,! $bucket,! $filename,! $restart ) {!! if (member($::mountpoints, “${directory}/current”) and $::environment == ‘local’) {! notice(“Looks like we are on Vagrant and you mounted the code in, skipping deploy.”)! } else {! ( . . . )! }!}
• ~70 Cents for a single test run.
• ~3.50 $ per workday.
• ~17.64 $ for always on staging per day.
• Tests disaster recovery on a sample dataset.
• Scalable setup.
• < 10 minutes
• Stagings just a click away.
Backend Security Group role=backend
Base Security Group
Metrics Security Group
role=metrics
Allow Inbound Base 8125
production branch=master
feature_x staging branch=feature_x
Backend Security Group role=backend
Base Security Group
Inventory Service Security
Group role=inventory
InstanceInstance
branch=feature_x
public_dns=api.feature_x.
eyeem.combranch=master
public_dns=api.eyeem.com
Backend Security Group role=backend
Base Security Group
Metrics Security Group
role=metrics
Allow Inbound Base 8125
production branch=master
feature_x staging branch=feature_x
Backend Security Group role=backend
Base Security Group
Inventory Service Security
Group role=inventory
InstanceInstance
branch=feature_x
public_dns=api.feature_x.
eyeem.combranch=master
public_dns=api.eyeem.com
Jobrunner Service Security
Group role=jobrunner
Culture
Practices
Tools
• ~350 Job Executions last month
• 350 times self service operations
• Stagings everywhere
• Definition of Done: Can you boot it up using EyeEmStack and Vagrant?
• Lots of 99.999s%
• “Everything fails all the time.”
• Test your repairs, automate everything.
• Distribute your data.
• Applications should be able to handle state transitions of service-parts and diagnose failure.
• Design your infrastructure towards acting as a service provider to your developers.
Questions?
Questions?