(dvo312) sony: building at-scale services with aws elastic beanstalk
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sumio Okada, Engineer, Sony
Shinya Kawaguchi, Engineer, Sony
October 2015
DVO 312
Building At-Scale Services
with AWS Elastic BeanstalkBuild a Cloud-native Authentication and Profile Management Platform on AWS
What to expect from the session
You will learn how to use AWS Elastic Beanstalk:
• As a platform to easily build customized web application at scale on
AWS.
• To seamlessly build cloud-native applications with other AWS
services.
Agenda
• Introduction
• Architecture
• Implementation
• Conclusion
Introduction
Who are we?We provide cloud solutions for Sony products and applications.
TV Side View
Smart Tennis Sensor Smart B-Trainer
Play Memories Online
Previous platform
An incident
Previous platform
• Built on the top of IaaS
• Self managed ‘base services’
• Monolithic system
Motivation of rebuild
• Agility
• Robustness
• Efficiency
Achievement - agility
BeforeItem
Deployment time Half a day 40 Min.
Zero downtime release
Release trouble rate 30% 0%
After
Release interval Bi-weekly NA (on demand)
Achievement - robustness
Before AfterItem
Access surges impact Unstable or down No impact
IaaS trouble impact Service damage No impact
Emergency operation Auto recover/healing
Related service down Affecting an entire system Minimum impact
Achievement - efficiency
Before AfterItem
Config management Manual Git (Infrastructure as Code)
7+ self-managed
services
0Infra for management
Scaling Not flexible Auto Scaling
Architecture
Auth & Profile
Mutually independent microservices
Service Providers
Frontend
Backend
Third party
Authentication
Services
Service Providers
Third party
Authentication
Services
Backend
Authentication and profile management system
Frontend
Auth & Profile
System overviewAuthentication and profile management system - 1
Public
PublicPrivatePublic
PrivatePublic
AZ-2
us-west2
AZ-1
NAT
NAT
HA
Service Providers
NATAPI
NATAPI
S3
Data Pipeline
Batch
EC2
Resource
Batch
Config
Log
Backup
Profile
DB
DynamoDB
API Call DynamoDB/S3
Route53
Third party
Authentication
Services
System overviewAuthentication and profile management system - 2
Public
PublicPrivatePublic
PrivatePublic
AZ-2
us-west2
Route53
AZ-1
S3
Service Providers
API Call DynamoDB/S3
Data Pipeline
Batch
EC2
Resource
NAT
NATAPI
NATAPI
NAT
Batch
Config
Log
Backup
Profile
DB
DynamoDB
HA
Third party
Authentication
Services
us-west2
System overview – CloudFormationBase layer
Public
PublicPrivatePublic
PrivatePublic
AZ-2
AZ-1
S3
NAT
NATProfile
DB
Dynamo DB
CloudFormation
HA
Public
PublicPrivatePublic
PrivatePublic
AZ-2
us-west2
AZ-1
S3
NAT
NATProfile
DB
Dynamo DB
HA
System overview - Elastic BeanstalkApplication layer
Elastic Beanstalk
NATAPI
NATAPI
Continuous delivery system
Code Repository
Development
Push Code
3 Build
Kick off
4 Unit Test
5 Push Image
6 Provision & Deploy
7 Sanity Test
Result
Delivery system without self-managed infrastructure
1
2
3
4
6
7
8
Development
QA5 Integration Test5
Get Image
Production
Throttling and Circuit BreakerSelf-defense for robustness
Throttling Circuit Breaker
APIs
Throttling Circuit Breaker
Third party
Authentication
Services
Zero-management infrastructure
EC2
Cloud Watch,
Logs
SNS
S3
Lambda
Redshift
Targets Monitoring
Metrics
Notification / Communication
Log Analysis
Logs
Import
Logs,
Metrics
Implementation
Auth
entication &
Pro
file
Managem
ent
Pla
tform
Implementation - motivation
Reproducible
Scalable
Highly available and fault tolerant
Secure and robust
Transparent
Auth
entication &
Pro
file
Managem
ent
Pla
tform
Implementation - motivation
Reproducible
Scalable
Highly available and fault tolerant
Secure and robust
Transparent
Infrastructure as code
• Automated operations
• Version control
• Continuous delivery
Infrastructure as code
• Versioning:
• CloudFormation templates
• Elastic Beanstalk configuration files (.ebextensions/*.config)
• Application/environment configuration files
• Automation scripts
Auth
entication &
Pro
file
Managem
ent
Pla
tform
Implementation - motivation
Reproducible
Scalable
Highly available and fault tolerant
Secure and robust
Transparent
Auto Scaling based on custom metric
• Custom Metric via Data Pipeline
AppApp
Alarms
ELB Metrics
ELB MetricsCloudWatch Data Pipeline
Auto Scaling group
Custom Metric
(Successful Response Rate per Instance)
Auto Scaling based on custom metric
• Custom scaling policies via .ebextensionsResources:
AutoScalingScaleOutPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName: { "Ref" : "AWSEBAutoScalingGroup" }
ScalingAdjustment: 2
AutoScalingScaleOutAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
Namespace: { "Fn::GetOptionSetting" : { "OptionName" : "AutoScalingMetricNamespace" } }
MetricName: { "Fn::GetOptionSetting" : { "OptionName" : "AutoScalingMetricName" } }
Dimensions: [ { "Name" : "LoadBalancerName", "Value" : { "Ref" : "AWSEBLoadBalancer" } } ]
...
AlarmActions: [ { "Ref" : "AutoScalingScaleOutPolicy" } ]
Auto Scaling based on custom metric
Disable default scaling policies via .ebextensionsResources:
AWSEBCloudwatchAlarmHigh:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmActions: [ { "Ref" : "AWS::NoValue" } ]
AWSEBCloudwatchAlarmLow:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmActions: [ { "Ref" : "AWS::NoValue" } ]
Auth
entication &
Pro
file
Managem
ent
Pla
tform
Implementation - motivation
Reproducible
Scalable
Highly available and fault tolerant
Secure and robust
Transparent
High availability for application
• Zero downtime deployment
• Auto healing based on deep health check
• Disk space shortage prevention
Zero downtime deployment
Auto Scaling group
• Rolling deployments
• Update application instances one by one
Batch
Batch
Batch
App
Working
App
Working
App
Working
Zero downtime deployment
Auto Scaling group
• Rolling deployments
• Update application instances one by one
Batch
Batch
Batch
App
Working
App
Working
App
Updating
Zero downtime deployment
• Rolling deployments via .ebextensionsoption_settings:
"aws:elasticbeanstalk:command":
BatchSizeType: Fixed
BatchSize: 1
Zero downtime deployment
Conflict between rolling deployments and scaling out
• Taken care of by Elastic Beanstalk
Zero downtime deployment
• Rolling updates
• Dynamic batch size
Auto Scaling group
MinSize 2
MaxSize 10
Batch
Batch
App
Working
App
Working
App
Working
App
Working
Increased by
scaling out
Zero downtime deployment
• Rolling updates
• Keep the number of in-service instances
Auto Scaling group
MinSize 2
MaxSize 10
Batch
Batch
App
Working
App
Working
App
Working
App
Working
New
Launching
New
Launching
Zero downtime deployment
• Rolling updates
• Keep the number of in-service instances
Auto Scaling group
MinSize 2
MaxSize 10
BatchApp
Working
App
Working
New
Launching
New
Launching
BatchNew
Working
New
WorkingApp
Terminating
App
Terminating
Zero downtime deployment
• Rolling updates via .ebextensionsoption_settings:
"aws:autoscaling:updatepolicy:rollingupdate":
RollingUpdateEnabled: true
MaxBatchSize: <num of running instances> / 2 # eg.) 2
MinInstancesInService: <num of running instances> # eg.) 4
Zero downtime deployment
Tradeoff
• Rolling deployments/updates
Definite app version switching
Low tolerance to deployment failure (rolling deployments)
Zero downtime deployment
Tradeoff
• Rolling deployments/updates
Definite app version switching
Low tolerance to deployment failure (rolling deployments)
• CNAME swap
High tolerance to deployment failure
DNS propagation
Zero downtime deployment
Tradeoff
• Rolling deployments/updates
Definite app version switching
Low tolerance to deployment failure (rolling deployments)
• CNAME swap
High tolerance to deployment failure
DNS propagation
Auto healing based on deep health check
• Deep health check
• Accuracy of system time
• Accessibility to main database (DynamoDB)
Auto healing based on deep health check
• Deep health check configuration via .ebextensionsoption_settings:
"aws:elasticbeanstalk:application":
"Application Healthcheck URL": /1/status
"aws:elb:healthcheck":
Interval: 15
Timeout: 10
HealthyThreshold: 3
UnhealthyThreshold: 3
Auto healing based on deep health check
• Auto healing configuration via .ebextensionsResources:
AWSEBAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
HealthCheckType: ELB
Auto healing based on deep health check
Rolling deployments with auto healing configuration
Problem
• Unexpected instance termination caused by Elastic Beanstalk
Auto healing based on deep health check
Rolling deployments with auto healing configuration
Problem
• Unexpected instance termination caused by Elastic Beanstalk
Workaround
• Suspend HealthCheck process in AWSEBAutoScalingGroup
during rolling deployments
Disk space shortage prevention
• Docker image local cache size
0%
20%
40%
60%
80%
100%
1 2 … n
Free
Docker Image Local Cache
System
Rolling Deployments
Dis
k U
sage
Pulling new layers
Disk space shortage prevention
• Remove unused Docker images via .ebextensionsfiles:
"/opt/elasticbeanstalk/hooks/appdeploy/post/99_01_remove-unused-docker-images.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/bash
docker images | grep -v "aws_beanstalk/" | grep -v "REPOSITORY" \
| xargs -I {} /bin/bash -c '
repository=$(echo "{}" | awk "{ print \$1 }")
tag=$(echo "{}" | awk "{ print \$2 }")
image_id=$(echo "{}" | awk "{ print \$3 }")
docker rmi $image_id || docker tag $image_id $repository:$tag || true
' || true
Disk space shortage prevention
• Docker container log size
• Container logs captured by Elastic Beanstalk
• /var/log/eb-docker/containers/eb-current-app/*-stdouterr.log
• Original container logs
• /var/lib/docker/containers/<cid>/<cid>-json.log
Disk space shortage prevention
• Docker container log size
• Container logs captured by Elastic Beanstalk
Rotated
• Original container logs
Keeps growing in size
Disk space shortage prevention
• Docker container logs truncation via .ebextensionsfiles:
"/etc/cron.hourly/cron.logtruncate.docker.json.log.conf":
mode: "000755"
owner: root
group: root
content: |
#!/bin/sh
# truncate docker container logs here.
# see appendix for the actual script implementation.
...
High availability for NAT
• NAT instance in AutoScalingGroup
• Periodic route table monitoring
NAT instance in AutoScalingGroup
• Static resources created via CloudFormation
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
Internet
MinSize 1
MaxSize 1
MinSize 1
MaxSize 1
NAT instance in AutoScalingGroup
• Dynamic NAT instances
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Pending
NAT
Pending
tag:NetworkSegment NAT-A
Public IP
Internet
tag:NetworkSegment NAT-B
Public IP
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
AutoScalingGroup launches
new NAT instance.
NAT instance in AutoScalingGroup
• Dynamic NAT instance configuration via cloud-init
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Running
NAT
Running
tag:NetworkSegment NAT-A
Elastic IP
Internet
tag:NetworkSegment NAT-B
Elastic IP
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
Disable SRC/DST check,
Assign Elastic IP, etc...
NAT instance in AutoScalingGroup
• Route table lookup
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Running
NAT
Running
Internet
New NAT Instance looks up
route tables based on tag.
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
tag:NetworkSegment NAT-A
Elastic IP
tag:NetworkSegment NAT-B
Elastic IP
NAT Instance in AutoScalingGroup
• Dynamic route configuration
Public Subnet
Public Subnet
Private Subnet for Apps
Private Subnet for Apps
AZ-2
AWS Region
AZ-1
NAT
Running
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
tag:NetworkSegment NAT-A
Elastic IP
tag:NetworkSegment NAT-B
Elastic IP
Periodic route table monitoring
• Running normally
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AWS Region
AZ-1
NAT
Running
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
0.0.0.0/0 Active
tag:NetworkSegment NAT-A
Internet
0.0.0.0/0 Active
tag:NetworkSegment NAT-B
NAT Instances monitor route tables
located in different AZs periodically.
Periodic route table monitoring
• Black hole route detection
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AWS Region
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
0.0.0.0/0 Black Hole
tag:NetworkSegment NAT-A
Internet
0.0.0.0/0 Active
tag:NetworkSegment NAT-B
Healthy NAT Instance detects
blackhole internet route.
AWS Region
Periodic route table monitoring
• Outbound traffic takeover
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus TakenOver
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
0.0.0.0/0 Active
Healthy NAT Instance takes
over outboud traffic to internet.
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
AWS Region
Periodic route table monitoring
• Outbound traffic takeover
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus TakenOver
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
0.0.0.0/0 Active
NAT
Pending
tag:NetworkSegment NAT-A
AutoScalingGroup launches
new NAT instance.
tag:NetworkSegment NAT-B
AWS Region
Periodic route table monitoring
• Route table lookup
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus TakenOver
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
Internet
0.0.0.0/0 Active
NAT
Running
tag:NetworkSegment NAT-A
tag:NetworkSegment NAT-B
New NAT Instance looks up
route tables based on tag.
AWS Region
Periodic route table monitoring
• Outbound traffic recovery
Public Subnet
Public SubnetPrivate Subnet
Private Subnet
AZ-2
AZ-1
NAT
Terminated
NATApp
NATApp
NAT
Running
tag:NetworkSegment NAT-A
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
tag:RoutingStatus OK
tag:NetworkSegment NAT-B
Internet
0.0.0.0/0 Active
NAT
Running
tag:NetworkSegment NAT-A
New NAT Instance recovers
internet route.
0.0.0.0/0 Active
Periodic route table monitoring
Network capacity planning for NAT instances
• Need to consider total amount of outbound traffic coming
from application instances across Availability Zones
Auth
entication &
Pro
file
Managem
ent
Pla
tform
Implementation - motivation
Reproducible
Scalable
Highly available and fault tolerant
Secure and robust
Transparent
Source IP address whitelisting
• Without whitelisting
AWSEBLoadBalancerSecurityGroup
No Inbound Rules
App
App
App
x.x.x.1 x.x.x.6x.x.x.5
Applied by
Elastic Beanstalk
AWSEBLoadBalancer
Source IP address whitelisting
• With whitelisting
ip-whitelist-group1-1
HTTPS TCP 443 x.x.x.1/32 …
AWSEBLoadBalancerSecurityGroup
No Inbound Rules
ip-whitelist-group1-2
HTTPS TCP 443 x.x.x.2/32
ip-whitelist-group1-3
HTTPS TCP 443 x.x.x.3/32
ip-whitelist-group1-4
HTTPS TCP 443 x.x.x.4/32
Configuration
files
tag:IPWhitelistGroup DefaultGroup
tag:IPWhitelistGroup Group1
tag:IPWhitelistGroup Group1
App
App
App
x.x.x.1 x.x.x.6
Rules
Rules
Rules
Rules
x.x.x.5
Applied via script
SecurityGroups
Max 200 (4*50) rules are available
AWSEBLoadBalancer
Add rules
via script
Source IP address whitelisting
• Tagging built-in resources via .ebextensionsResources:
AWSEBLoadBalancer:
Type: AWS::ElasticLoadBalancing::LoadBalancer
Properties:
Tags:
- { Key: IPWhitelistGroup, Value: Group1 }
AWSEBLoadBalancerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: "Load Balancer Security Group"
VpcId: { "Fn::GetOptionSetting" : { "OptionName" : "VPCId" } }
Tags:
- { Key: IPWhitelistGroup, Value: DefaultGroup }
Source IP address whitelisting
Fill required properties in security group for ELB
via .ebextensionsResources:
AWSEBLoadBalancer:
Type: AWS::ElasticLoadBalancing::LoadBalancer
Properties:
Tags:
- { Key: IPWhitelistGroup, Value: Group1 }
AWSEBLoadBalancerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: "Load Balancer Security Group"
VpcId: { "Fn::GetOptionSetting" : { "OptionName" : "VPCId" } }
Tags:
- { Key: IPWhitelistGroup, Value: DefaultGroup }
Specifying GroupDescription and VpcId is also required
in order to modify AWSEBLoadBalancerSecurityGroup
resource via .ebextensions.
Connection/request throttling
• Throttling per client (source IP address)
Amazon Linux
Docker Container
App
APIs
Internal
Service
External
Services
Over Limit
Over Limit
Third party
Authentication
Services
Internal
Service
Connection/request throttling
• Throttling per remote user (internal service)
Amazon Linux
Docker Container
External
ServicesOver Limit
Over Limit
Internal
Service
App
APIs
Third party
Authentication
Services
Connection/request throttling
• nginx configuration file installation via .ebextensionsfiles:
"/etc/nginx/throttling/limit-zone-def.conf":
mode: "000644"
owner: root
group: root
content: |
# include in http context
limit_conn_zone $http_x_forwarded_for zone=conn_perclient:10m;
limit_conn_zone $hostname zone=conn_total:1m;
limit_conn_status 429;
limit_req_zone $remote_user zone=req_perservice:10m rate=150r/s;
limit_req_zone $hostname zone=req_total:1m rate=200r/s;
limit_req_status 429;
Connection/request throttling
• nginx configuration file installation via .ebextensionsfiles:
"/etc/nginx/throttling/limit-per.conf":
mode: "000644"
owner: root
group: root
content: |
# include in location context
limit_conn conn_perclient 75;
limit_req zone=req_perservice burst=300 nodelay;
Connection/request throttling
• nginx configuration file installation via .ebextensionsfiles:
"/etc/nginx/throttling/limit-total.conf":
mode: "000644"
owner: root
group: root
content: |
# include in location context
limit_conn conn_total 300;
limit_req zone=req_total burst=400 nodelay;
Connection/request throttling
• nginx configuration script (.ebextensions/nginx-conf.sh)#!/bin/bash
EB_CONFIG_HTTP_PORT=$(/opt/elasticbeanstalk/bin/get-config container -k instance_port)
cat > /etc/nginx/sites-available/nginx-docker-proxy.conf <<EOF
...
include throttling/limit-zone-def.conf;
server {
listen $EB_CONFIG_HTTP_PORT;
location / {
...
include throttling/limit-per.conf;
include throttling/limit-total.conf;
}
location ~ /.+?/status {
...
include throttling/limit-per.conf;
}
}
EOF
rm -f /etc/nginx/sites-enabled/*
ln -sf /etc/nginx/sites-available/nginx-docker-proxy.conf /etc/nginx/sites-enabled/
Connection/request throttling
• nginx configuration via .ebextensionscontainer_commands:
nginx-conf-for-throttling:
command: 'bash .ebextensions/nginx-conf.sh'
Connection/request throttling
Tradeoff
Advantages taken from throttling
Low compatibility
External
Services
Internal
Services
Circuit Breaker
• Proxy object for each external service
Amazon Linux
Docker Container
App
Open
Closed
Closed
ClosedAPIs
Immediate failure
Third party
Authentication
Services
Auth
entication &
Pro
file
Managem
ent
Pla
tform
Implementation - motivation
Reproducible
Scalable
Highly available and fault tolerant
Secure and robust
Transparent
Comprehensive log monitoring
Cloud Watch,
Logs
SNS
S3
Lambda
Redshift
Targets Monitoring
Metrics
Notification / Communication
Log Analysis
Logs
Import
Logs,
Metrics
AppNAT
Comprehensive log monitoring
• LogGroup creation via .ebextensionsResources:
CWLSyslogMessagesLogGroup:
Type: "AWS::Logs::LogGroup"
DependsOn: AWSEBBeanstalkMetadata
Properties:
LogGroupName: { "Fn::Join" : [ "-", [ { "Ref" : "AWSEBEnvironmentName" },
"syslog-messages" ] ] }
RetentionInDays: 14
Comprehensive log monitoring
• CloudWatch Logs agent config file via .ebextensionsResources:
AWSEBAutoScalingGroup:
Metadata:
"AWS::CloudFormation::Init":
CWLogsAgentConfigSetup:
files:
"/tmp/cwlogs/conf.d/core-logs.conf":
content : |
[/var/log/messages]
file = /var/log/messages
log_group_name = `{ "Ref" : "CWLSyslogMessagesLogGroup" }`
log_stream_name = {instance_id}
datetime_format = %b %d %H:%M:%S
Notification / Communication
Searchable log retention
Cloud Watch,
Logs
SNS
S3
Lambda
Redshift
Targets Monitoring
Metrics
Log Analysis
Import
Logs,
Metrics
AppNAT
Logs
Notification / Communication
Searchable log retention
Cloud Watch,
Logs
SNS
S3
Lambda
Redshift
Targets Monitoring
Metrics
Log Analysis
Import
Logs,
Metrics
AppNAT
flush_interval 60s
flush_at_shutdown true
Logs
Searchable log retention
• td-agent configuration via .ebextensionsfiles:
"/etc/sysconfig/td-agent":
mode: "000644"
owner: root
group: root
content: |
# Run as root user
TD_AGENT_ARGS="/usr/sbin/td-agent --group td-agent --log /var/log/td-agent/td-agent.log --use-v1-config \
--suppress-repeated-stacktrace"
DAEMON_ARGS="--user root“
commands:
01-prepare-installer:
command: ... # Install td-agent installation script to /tmp/td-agent/install-td-agent-v2.sh
02-run-installer-td-agent:
command: bash /tmp/td-agent/install-td-agent-v2.sh
03-setup-configration:
command: ... # Configure log sources for td-agent
04-restart-td-agent:
command: service td-agent restart
Searchable log retention
• Enable ELB to upload access logs to Amazon S3Resources:
AWSEBLoadBalancer:
Type: AWS::ElasticLoadBalancing::LoadBalancer
Properties:
AccessLoggingPolicy:
S3BucketName: { "Fn::GetOptionSetting" : { "OptionName" : "LogsBucketName" } }
S3BucketPrefix: "elb"
Enabled: true
EmitInterval: 5 # minutes
Conclusion
Challenges and expectations
• Compatibility
• Ease of operation test
Trouble-less eight months in production with
Elastic Beanstalk
• FlexibilitySatisfy customization needs
• ReliabilityNo major problems
• SimplicitySimplified DevOps
Thank you!
Question and answer
Remember to complete
your evaluations!
Appendix
Sony open source software
• gobreaker
• Go implementation of circuit breaker
• Available on GitHub
• https://github.com/sony/gobreaker
• Feel free to submit pull requests and raise issues on the
GitHub project
Sony open source software
• Sonyflake
• Go implementation of distributed unique ID generator
• Available on GitHub
• https://github.com/sony/sonyflake
• Small utility for AWS (VPC) included
• Example running on EB provided
• Feel free to submit pull requests and raise issues on the
GitHub project
Articles
• Continuous Delivery with Golang and Docker
• https://circleci.com/stories/sony
References
• Advanced network automation
• (ARC401) Black-Belt Networking for the Cloud Ninja | AWS
re:Invent 2014
• Docker container log rotation
• https://github.com/docker/docker/issues/7333
• https://docs.docker.com/reference/logging/overview/
Auto Scaling designScale out timing chart
Execute Policy
Running
In ServiceOut of Service
App Startup
ELB Determination
Health Check Grace Period
Deployment
In Service Dead Line Resume Auto Scaling
EC2 State
ELB Instance State
Cooldown Period (scale out policy)
Register Instance
Pending
Auto Scaling
Timers
* in the case of HealthCheckType: ELB
Auto Scaling designScale out timing parameters
Execute Policy
Running
In ServiceOut of Service
App Startup
45 ELB Determination
HealthCheck Interval x HealthyThreshold
Health Check Grace Period 600
Deployment
In Service Dead Line Resume Auto Scaling
Margin 300
Margin for
Balancing & Metric
EC2 State
ELB Instance State
Cooldown Period (scale out policy) 900
300 avg.
15 3
300
Register Instance
Pending
Auto Scaling
Timers
* in the case of HealthCheckType: ELB
Examples
• Elastic IP association via cloud-init#!/bin/bash
REGION=$1
EIP_ALLOCATION_ID=$2
INSTANCE_ID=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id)
while true; do
INSTANCE_STATUS=$(aws --region "${REGION}" --output text \
ec2 describe-instance-status \
--instance-ids "${INSTANCE_ID}" \
--filters Name=instance-state-name,Values=running)
if [[ $? = 0 && "${INSTANCE_STATUS}" != "" ]]; then
aws --region "${REGION}" --output text \
ec2 associate-address --instance-id "${INSTANCE_ID}" \
--allocation-id "${EIP_ALLOCATION_ID}" && break
fi
sleep 5s
done
Examples
• Elastic IP association via cloud-init
• associate-address command fails if the instance is still in
pending state
• Need to wait for the instance to become running state before
executing associate-address command
Examples
• Connection draining
Keep accepting requests (10~20s)
ConnectionDrainingTimeout
Examples
• Connection draining via .ebextensionsoption_settings:
"aws:elb:policies":
ConnectionDrainingEnabled: true
ConnectionDrainingTimeout: 80 # 20 + 60 seconds
Examples
• Docker container log truncation#!/bin/sh
cidfile=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_file)
[ ! -r "${cidfile}" ] && exit 0
cid=$(cat "${cidfile}")
scid=${cid::12}
dockerlog="/var/lib/docker/containers/${cid}/${cid}-json.log"
[ ! -w "${dockerlog}" ] && exit 0
# The eb-log file made by Elastic Beanstalk.
eblog="/var/log/eb-docker/containers/eb-current-app/${scid}-stdouterr.log"
# PID of docker logs command related to the Container-ID.
logspids=$(ps aux | grep "docker logs -f ${scid}" | grep -v grep | awk '{print $2}')
for logspid in ${logspids}
do
# Count FD of docker logs related to the eb-log file.
eblogfd=$(lsof -p ${logspid} | grep "${eblog}" | wc -l)
# Expect to be redirected stdout and stderr to the eb-log file.
[ ! ${eblogfd} -eq 2 ] && continue
# Now, can truncate the docker-log file.
cat /dev/null > ${dockerlog}
break
done
Examples
• Run ntpd in slew mode via .ebextensionsfiles:
"/etc/sysconfig/ntpd":
mode: "000644"
owner: root
group: root
content: |
OPTIONS="-g -x"
commands:
"ntpd-service-restart":
command:
service ntpd restart
Examples
• Scaling event notification via .ebextensionsResources:
AWSEBAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
HealthCheckType: ELB
NotificationConfiguration:
TopicARN: { "Fn::GetOptionSetting" : { "OptionName" : “ASGTopicArn" } }
NotificationTypes:
- autoscaling:EC2_INSTANCE_LAUNCH
- autoscaling:EC2_INSTANCE_LAUNCH_ERROR
- autoscaling:EC2_INSTANCE_TERMINATE
- autoscaling:EC2_INSTANCE_TERMINATE_ERROR
Examples
• td-agent installation script#!/usr/bin/env bash
Enterprise Linux 7 (releasever is '7')
# add GPG key
rpm --import http://packages.treasuredata.com/GPG-KEY-td-agent
# add treasure data repository to yum
cat > /etc/yum.repos.d/td.repo <<EOF
[treasuredata]
name=TreasureData
baseurl=http://packages.treasuredata.com/2/redhat/7/\$basearch
gpgcheck=1
gpgkey=http://packages.treasuredata.com/GPG-KEY-td-agent
EOF
# install the toolbelt
yum install -y td-agent-2.1.5-1
# install plugins
/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-tail_path -v "=0.0.3"
/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-forest -v "=0.3.0"
/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-add -v "=0.0.3"
# this plugin will be no longer required in next td-agent version.
/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-s3 -v "=0.5.7"
# enable service
chkconfig td-agent on