amazon batch: 實現簡單且有效率的批次運算
TRANSCRIPT
![Page 1: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/1.jpg)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Harry Lin (林書平), Solutions Architect
Amazon Web Services
January 2017
Introducing AWS Batch
實現簡單且有效率的批次運算
![Page 2: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/2.jpg)
Agenda
• Batch computing overview
• AWS Batch overview and concepts
• Use cases
• Let’s take it for a spin!
![Page 3: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/3.jpg)
Run jobs asynchronously and automatically across one or
more computers.
Jobs may have dependencies, making the sequencing and
scheduling of multiple jobs complex and challenging.
What is batch computing?
![Page 4: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/4.jpg)
CRAY-1: 1976
• First commercial
supercomputer
• 167 millions
calculations/second
• USD$8.86 million
($7.9 million plus
$1 million for disk)
CRAY-1 on display in the hallways of the EPFL in Lausanne. https://commons.wikimedia.org/wiki/File:Cray_1_IMG_9126.jpg
![Page 5: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/5.jpg)
Early Batch on AWS: NY Times TimesMachine
aws.amazon.com/blogs/aws/new-york-times/
In 2007 the New York Times
processed 130 years of archives in
36 hours.
11 million articles & 4 TB of data
AWS services used:
Amazon S3, SQS, EC2, and EMR
Total cost (in 2007): $890
$240 compute + $650 storage
http://open.blogs.nytimes.com/2007/11/01/self-service-
prorated-super-computing-fun/
![Page 6: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/6.jpg)
Batch Computing On-Premises
![Page 7: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/7.jpg)
RAMI/O
CPU CPU CPU RAM CPU
CPU RAM
I/O
CPU CPU RAM
![Page 8: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/8.jpg)
RAM
I/O
CPUCPU
CPU
I/O
RAM
CPU RAMI/O
CPU
![Page 9: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/9.jpg)
RAM
I/OGPU
StorageCPU FPGA
![Page 10: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/10.jpg)
How does this work in the cloud?
![Page 11: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/11.jpg)
RAM
I/OGPU
StorageCPU FPGA
R4
I3P2
D2C5 F1
![Page 12: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/12.jpg)
However, batch computing could be easier…
AWS
Components:
• EC2
• Spot Fleet
• Auto Scaling
• SNS
• SQS
• CloudWatch
• AWS Lambda
• S3
• DynamoDB
• API Gateway
• …
![Page 13: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/13.jpg)
Introducing AWS Batch
Fully managed
No software to install or
servers to manage.
AWS Batch provisions,
manages, and scales your
infrastructure
Integrated with AWS
Natively integrated with the
AWS platform, AWS Batch
jobs can easily and securely
interact with services such as
Amazon S3, DynamoDB, and
Rekognition
Cost-optimized
resource provisioning
AWS Batch automatically
provisions compute
resources tailored to the
needs of your jobs using
Amazon EC2 and Spot
![Page 14: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/14.jpg)
Introducing AWS Batch
• Fully managed batch primitives
• Focus on your applications (shell scripts,
Linux executables, Docker images) and
their resource requirements
• We take care of the rest!
![Page 15: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/15.jpg)
AWS Batch Concepts
• Jobs
• Job definitions
• Job queue
• Compute environments
• Scheduler
![Page 16: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/16.jpg)
Jobs
Jobs are the unit of work executed by AWS Batch as containerized
applications running on Amazon EC2.
Containerized jobs can reference a container image, command, and
parameters or users can simply provide a .zip containing their
application and we will run it on a default Amazon Linux container.
$ aws batch submit-job --job-name variant-calling --job-definition gatk --job-queue genomics
![Page 17: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/17.jpg)
Easily run massively parallel jobs
Today, users can submit a large number of independent “simple jobs.”
In the coming weeks, we will add support for “array jobs” that run
many copies of an application against an array of elements.
Array jobs are an efficient way to run:
• Monte Carlo simulations
• Processing a large collection of objects
• Parametric sweeps
![Page 18: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/18.jpg)
Workflows and Job Dependencies
![Page 19: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/19.jpg)
Workflows, Pipelines, and Job Dependencies
Jobs can express a dependency on the successful
completion of other jobs or specific elements of an
array job.
Use your preferred workflow engine and language to
submit jobs. Flow-based systems simply submit jobs
serially, while DAG-based systems submit many jobs
at once, identifying inter-job dependencies.
$ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f ...
![Page 20: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/20.jpg)
Job Definitions
Similar to ECS task definitions, AWS Batch job definitions specify how
jobs are to be run. While each job must reference a job definition, many
parameters can be overridden.
Some of the attributes specified in a job definition:
• IAM role associated with the job
• vCPU and memory requirements
• Mount points
• Container properties
• Environment variables
$ aws batch register-job-definition --job-definition-name gatk--container-properties ...
![Page 21: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/21.jpg)
Job Queues
Jobs are submitted to a job queue, where they reside until they are
able to be scheduled to a compute resource.
$ aws batch create-job-queue --job-queue-name genomics --priority 500 --compute-environment-order ...
![Page 22: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/22.jpg)
AWS Batch Concepts
Job queues are mapped to one or more compute environments
containing the EC2 instances used to run containerized batch jobs.
Managed compute environments enable you to describe your business
requirements (instance types, min/max/desired vCPUs, and Spot bid
as a % of On-Demand) and we launch and scale resources on your
behalf.
Alternatively, you can launch and manage your own resources within
an unmanaged compute environment. Your instances need to include
the ECS agent and run supported versions of Linux and Docker.
![Page 23: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/23.jpg)
AWS Batch Concepts
The Scheduler evaluates when, where, and
how to run jobs that have been submitted to
a job queue.
Jobs run in approximately the order in which
they are submitted as long as all
dependencies on other jobs have been met.
![Page 24: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/24.jpg)
Job States
Jobs submitted to a queue can have the following states:
SUBMITTED: Accepted into the queue, but not yet evaluated for execution
PENDING: Your job has dependencies on other jobs which have not yet completed
RUNNABLE: Your job has been evaluated by the scheduler and is ready to run
STARTING: Your job is in the process of be scheduled to a compute resource
RUNNING: Your job is currently running
SUCCEEDED: Your job has finished with exit code 0
FAILED: Your job finished with non-zero exit code or was cancelled or terminated
![Page 25: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/25.jpg)
AWS Batch Pricing
There is no charge for AWS Batch; you only pay for the
underlying resources that you consume!
![Page 26: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/26.jpg)
AWS Batch Availability
• General available in the US East (Northern Virginia) Region
• Support for array jobs and jobs executed as AWS Lambda
functions coming soon!
![Page 27: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/27.jpg)
Using the Right Tool for the Right Job
Not all batch workloads are the same
ETL and Big Data processing/analytics?
Consider EMR, Data Pipeline, Redshift, and related services.
Lots of small Cron jobs? AWS Batch is a great way to execute these jobs,
but you will likely to want a workflow or job-scheduling system to orchestrate
job submissions.
Efficiently run lots of big and small compute jobs on heterogeneous
compute resources? That’s why we are here!
![Page 28: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/28.jpg)
DNA Sequencing
![Page 29: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/29.jpg)
Financial Trading Analytics
![Page 30: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/30.jpg)
Would you like to see a demo?
![Page 31: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/31.jpg)
Fully managed Integrated with AWS Cost-optimized
resource provisioning
![Page 32: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/32.jpg)
Thank you!
![Page 33: Amazon Batch: 實現簡單且有效率的批次運算](https://reader030.vdocuments.site/reader030/viewer/2022021423/588a11ed1a28ab132f8b5a5d/html5/thumbnails/33.jpg)
Remember to complete
your evaluations!