aws re:invent 2016: running batch jobs on amazon ecs (con310)
TRANSCRIPT
![Page 1: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/1.jpg)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Asha Chakrabarty, Senior Solutions Architect, AWS
Will White, Engineering Lead, Mapbox
December 1, 2016
Running Batch Processes on ECS
CON310
![Page 2: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/2.jpg)
What to Expect from the Session
• Understand the challenges of running batch processes
• Why Amazon ECS for Batch?
• Architectural Design Patterns
• Best Practices
• Mapbox and Amazon ECS
![Page 3: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/3.jpg)
Challenges of Running Batch Workloads
• Typically resource intensive
• Time constraint for completion
• Potential impact to concurrent batch jobs
• Scaling infrastructure resources
• Ensuring effective resource utilization and cost savings
• Fragile and unreliable
![Page 4: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/4.jpg)
What Batch Workloads Need
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
![Page 5: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/5.jpg)
Why ECS for Batch Processing?
![Page 6: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/6.jpg)
Cluster Management Made Easy
Nothing to run
Complete state
Control and monitoring
Scale
![Page 7: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/7.jpg)
Performance at Scale
![Page 8: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/8.jpg)
Flexible Container Placement
Applications
Batch jobs
Multiple schedulers
![Page 9: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/9.jpg)
Designed for Use with Other AWS Services
Elastic Load Balancing
Amazon Elastic Block Store
Amazon Virtual Private Cloud
AWS Identity and Access Management
AWS CloudTrail
![Page 10: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/10.jpg)
Security
Your own EC2 instances in a VPC
with all its security features to
provide a high level of isolation.
![Page 11: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/11.jpg)
Key Concepts
![Page 12: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/12.jpg)
Tasks Containers
ClustersContainer Instances
![Page 13: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/13.jpg)
TasksContainers
ClustersContainer Instances
![Page 14: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/14.jpg)
Task: A grouping of related containers
Nginx Web Server Rails Application
MySQL Database
Log Collector
![Page 15: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/15.jpg)
Task Definition
{ “family” : “my-website”,
“version” : “1.0”
“containers” : [
<<CONTAINER DEFINTIONS>>
]
}
![Page 16: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/16.jpg)
Tasks Containers
ClustersContainer Instances
![Page 17: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/17.jpg)
Container Definition
Names and identifies your image
Includes default runtime attributes for your container• Environment Variables
• Port Mappings
• Container entry point and commands
• Resource constraints
• Etc.
![Page 18: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/18.jpg)
Example
{ “name” : “webServer”,
“image” : “nginx:latest”
“cpu” : 512,
“memory” : 128,
“portMappings” : [ { “containerPort” : 9443, “hostPort” : 443 }],
“links” : [“rails”],
“essential” : true
}
![Page 19: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/19.jpg)
Tasks Containers
ClustersContainer Instances
![Page 20: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/20.jpg)
Cluster
Provides a pool of resources for
your Tasks
A grouping of Container Instances
Starts empty, dynamically scalable
![Page 21: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/21.jpg)
Tasks Containers
ClustersContainer Instances
![Page 22: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/22.jpg)
Container Instance
EC2 instance on which Tasks are scheduled
We provide ECS-optimized AMI or you can download lightweight ECS Agent
Registers into cluster upon launch
Different EC2 instance types for variety in resource pool
![Page 23: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/23.jpg)
Architectural Design Patterns
![Page 24: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/24.jpg)
Trigger Batch Processing with Lambda
Amazon ECS
Availability Zone Availability Zone
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3 Bucket
(Source)
ecs:RunTask
Amazon
S3 Bucket
(Target)
Amazon
S3 Bucket
ObjectAmazon
CloudWatchAWS CloudTrail
![Page 25: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/25.jpg)
Fleet of workers with ECS with SQS
Amazon ECS
Availability Zone Availability Zone
SQS queue
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3
DynamoDB
Amazon
Kinesis
ecs:RunTask
Amazon
CloudWatchAWS CloudTrail
![Page 26: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/26.jpg)
Long-running Batch Jobs
• Utilize Spot Instances
• EC2 Spot Blocks for
Defined-Duration
Workloads
• ECS event stream for
CloudWatch Events
• Service Scaling and
Monitoring
Amazon ECS
Availability Zone Availability Zone
Container Instance Container InstanceAutoScaling Group
Task A Task B
Task C
Amazon
CloudWatchAWS CloudTrail
![Page 27: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/27.jpg)
Best Practices
• Store state and inputs, outputs in S3 or another datastore
• Minimize dependencies between task definitions (should
be independent of each other)
• Use Spot Instances and Spot fleets for long-running
batch jobs
• Monitor cluster state with ECS APIs
• Share pools of resources
• Auto Scaling, VPC, IAM, scheduled Reserved Instances
![Page 28: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/28.jpg)
ECS at Mapbox
![Page 29: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/29.jpg)
![Page 30: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/30.jpg)
Maps
Directions Geocoding
Mobile
Developer tools
Analysis
![Page 31: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/31.jpg)
![Page 32: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/32.jpg)
![Page 33: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/33.jpg)
![Page 34: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/34.jpg)
![Page 35: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/35.jpg)
![Page 36: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/36.jpg)
![Page 37: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/37.jpg)
![Page 38: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/38.jpg)
![Page 39: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/39.jpg)
![Page 40: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/40.jpg)
![Page 41: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/41.jpg)
![Page 42: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/42.jpg)
![Page 43: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/43.jpg)
![Page 44: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/44.jpg)
![Page 45: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/45.jpg)
![Page 46: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/46.jpg)
![Page 47: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/47.jpg)
3 billion probes = 100 million miles
per day
![Page 48: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/48.jpg)
Similar pattern for batch processing
• EC2 instances
• SQS queue
• Error handling / reporting
![Page 49: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/49.jpg)
Introducing Watchbot
![Page 50: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/50.jpg)
What is watchbot?
A library to help run a highly-scalable AWS service that
performs data processing tasks in response to external
events.
You provide the the messages and the logic to process
them, while Watchbot handles making sure that your
processing task is run at least once for each message.
![Page 51: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/51.jpg)
https://github.com/mapbox/ecs-watchbot
![Page 52: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/52.jpg)
ECS Cluster
SQS
Watcher
Container
Running
Tasks
![Page 53: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/53.jpg)
Your task can do anything you want!
• Your task can be anything that works in Docker
• Use any language
• Environment variables as input
• bash exit codes to indicate success/failure/retry
• Do any I/O
• Save outputs to S3 or DynamoDB
![Page 54: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/54.jpg)
Environment Variables
Name Description
Subject the message's subject
Message the message's body
MessageId the message's ID defined by SQS
SentTimestamp the time the message was sent
ApproximateFirstReceiveTimestamp the time the message was first received
ApproximateReceiveCount
the number of times the message has been
received
![Page 55: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/55.jpg)
Messages
• Use any format as long as your task is equipped to handle
it
• JSON can capture more complex
![Page 56: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/56.jpg)
Exit Codes
Exit code Description Outcome
0 completed successfullymessage is removed from the queue without
notification
3 rejected the messagemessage is removed from the queue and a
notification is sent
4 no-opmessage is returned to the queue without
notification
other failuremessage is returned to the queue and a
notification is sent
![Page 57: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/57.jpg)
More features!
• Logging - write logs to CloudWatch LogGroup
• Send alarms to SNS
• Reduce mode - tracks progress of distributed tasks and
runs a reduce task when everything finishes
![Page 58: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/58.jpg)
Why not Lambda?
Watchbot is similar in many regards to AWS Lambda, but is
more configurable, more focused on data processing, and
not subject to several of Lambda's limitations.
• Full control over execution environment allows you to install anything you want
• No limits on execution time
• No memory limits
• No concurrency limits or account-wide throttling
• No DynamoDB Streams or Kinesis support
![Page 59: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/59.jpg)
Gotcha: EBS Boot
• ECS optimized instances are only available as EBS boot
AMIs so consider rolling your own instance store AMI
• EBS is more expensive - especially if you are running
many instances on Spot
• Slower than ephemeral disks
![Page 60: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/60.jpg)
Gotcha: EBS Boot
![Page 61: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/61.jpg)
Demo!
![Page 62: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/62.jpg)
https://github.com/mapbox/ecs-telephone
![Page 63: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/63.jpg)
14Data Processing
Services
3500Peak Container
Instances
500 millionCompute Hours Used
This Year
![Page 64: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/64.jpg)
Thank you!
![Page 65: AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)](https://reader033.vdocuments.site/reader033/viewer/2022042723/587126311a28abe4448b62dd/html5/thumbnails/65.jpg)
Remember to complete
your evaluations!