launch - amazon web servicesaws-de-media.s3.amazonaws.com/images/aws_summit... · 2 2 2 4 2 1 1 3 7...

50
Research Computing @ AWS AWS Worldwide Research Computing 2018-06-06 BER (AWS Summit)

Upload: others

Post on 19-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Research Computing @ AWSAWS Worldwide Research Computing

2018-06-06 – BER (AWS Summit)

Page 2: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Snr Solution Architect

• Leads solution designing with AWS Pub Sec customers all over DE, AT & CH

• Software Engineer

• Based in Munich

• Owns a cat

Ralph

HPC Specialist

• Recovering Professor

• Aircraft Designer

• Based in London

Scott

Research Computing Manager

• Recovering Physicist & Super Computer Guy

• Based in London

• Owns a dog

Boof

Page 3: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

launch

Page 4: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

IT’S ABOUT

SCIENCE,

NOT

SERVERS.

#AWSresearchcloud

aws.amazon.com/rcp

Page 5: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

DATASETS,

TOOLS &

TECHNIQUES

#AWSresearchcloud

aws.amazon.com/rcp

Page 6: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Failure

lot of experiments

failed experiments

Page 7: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Prediction

Experiment

Results

Refine Model

Hypothesis

Credit: Aristotle

Page 8: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 9: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 10: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

$ telnet example.org 25

S: 220 example.org ESMTP Sendmail 8.13.1/8.13.1; Wed, 30 Aug 2006

07:36:42 -0400

C: HELO mailout1.phrednet.com

S: 250 example.org Hello ip068.subnet71.gci-net.com [216.183.71.68],

pleased to meet you

C: MAIL FROM:<[email protected]>

S: 250 2.1.0 <[email protected]>... Sender ok

C: RCPT TO:<[email protected]>

S: 250 2.1.5 <[email protected]>... Recipient ok

C: DATA

S: 354 Enter mail, end with "." on a line by itself

From: Dave\r\nTo: Test Recipient\r\nSubject: SPAM SPAM SPAM\r\n\r\nThis

is message 1 from our test script.\r\n.\r\n

S: 250 2.0.0 k7TKIBYb024731 Message accepted for delivery

C: QUIT

S: 221 2.0.0 example.org closing connection

Connection closed by foreign host.

Page 11: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 12: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

As pretty as an

airport

Page 13: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

No one

#!/bin/bash#SBATCH --job-name=gpuMemTest#SBATCH --output=gpuMemTest_%j.out#SBATCH --error=gpuMemTest_%j.err#SBATCH --ntasks=2#SBATCH --cpus-per-task=1#SBATCH --distribution=cyclic:cyclic#SBATCH --time=12:00:00#SBATCH --mem-per-cpu=2000##SBATCH --mail-type=END,FAIL##SBATCH [email protected]#SBATCH --partition=gpu#SBATCH --gres=gpu:tesla:2date;hostname;pwd

module load cuda/9.1.85

cudaMemTest=/ufrc/ufhpc/chasman/Cuda/cudaMemTest/cuda_memtest

cudaDevs=$(echo $CUDA_VISIBLE_DEVICES | sed -e 's/,/ /g')

for cudaDev in $cudaDevsdoecho cudaDev = $cudaDev#srun --gres=gpu:tesla:1 -n 1 --exclusive ./gpuMemTest.sh >

gpuMemTest.out.$cudaDev 2>&1 &$cudaMemTest --num_passes 1 --device $cudaDev > gpuMemTest.out.$cudaDev 2>&1 &

done

Page 14: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 15: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 16: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

8

2

1

9

5

4

53

12

3

6

1

9

4

8

1

2

8

7

7

6

Time (days)

Co

res

Page 17: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

2

2 2

4

2

1

1

3

7

7

4

9

5

7

6 6

77

4

8

4

Time (days)

Core

s

Page 18: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

job submit

Page 19: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 20: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

RFP

Page 21: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 22: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Hardware

Page 23: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Humans

Page 24: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 25: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 26: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Almost Everyone else

Page 27: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

PRABHU ET AL (2009)

"Despite enormous wait times, many

scientists run their programs only on

desktops”

"About a third of researchers did not use

any form of parallelism in their research

at all”

“Currently, many researchers fit their

scientific models to only a subset of

available parameters for faster program

runs.”

Page 28: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

HANNAY ET AL (2009)

• Online survey of 1972 international

researchers

• ~80% never use a supercomputer

Page 29: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

traditionally

learning job

submission syntax

Scaling up

scale down

something new inside

[IT]Technology needs to be in the service of the science, not it’s master.

Page 30: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 31: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

laptop server

server cluster

CPU GPU

… in minutes.

Page 32: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

def my_function(b):

x = np.random.normal(0, b, 1024)

A = np.random.normal(0, b, (1024, 1024))

return np.dot(A, x)

pwex = pywren.default_executor()

res = pwex.map(my_function, np.linspace(0.1, 100, 1000))

PyWren lets you run your existing python code at massive scale via AWS Lambda

Page 33: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

DEMO

Page 34: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Immediately scale

Create your own

software stacks

local catalog

LAPTOP

Most research starts here.

CLOUD

HANNAY ET AL (2009)

Page 35: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Immediately scale

Create your own

software stacks

local catalog

LAPTOP

Most research starts here.

HANNAY ET AL (2009)

Page 36: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 37: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

WITH GREAT

POWER COMES

GREAT VISIBILITY

guardrails

governance

Page 38: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

DEMO

Page 39: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

inside your

account

focus once

security and privacy

architecture.

enforcement

Page 40: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Nextflow includes built-in

support for AWS Batch,

which that allows the

execution of containerised

workloads over the

Amazon EC2 Elastic

Container Service (ECS).

This allows the

deployment of Nextflow

pipelines in the cloud by

offloading the process

executions as managed

Batch jobs.

The service takes care to

spin up the required

computing instances on-

demand, scaling up and

down the number and

composition of the

instances to best

accommodate the actual

workload resource needs

at any point in time.

Page 41: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 42: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 43: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

2

2 2

4

2

1

1

3

7

7

4

9

5

7

6 6

77

4

8

4

Co

res

8

2

1

9

5

4

53

12

3

6

1

9

4

8

1

2

8

7

7

6

Fixed Data Centre Capacity Limit

Co

res

Specialized hardware

Unfortunately finite capacity, usually with

long queues to wait in.

Burdened with significant workloads that

scale well on AWS.

Cloud Expansion Environment

Burst workloads or migrate specific groups to a familiar,

almost identical software environment.

Massive capacity when needed to speed up time to

results, and agile environment when additional hardware

and software experimentation is needed.

all major job schedulers

Scaling Research in a Hybrid

Cluster Environment

Page 44: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

Time (days)

Co

res

Page 45: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

1,500+ popular scientific applications

AWS Marketplace

EC2 Spot market

immediately

Introducing Alces Flight - self-scaling HPC-style clusters instantly ready to compute, billed by the hour and using the AWS Spot

market by default to achieve supercomputing for ~1c per core per hour.

http://alces-flight.com/

Page 46: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

DEMO

Page 47: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data
Page 48: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

• Humans need the most help right now

• Automate

crap tasks

Don’t be shy

http://boofla.io/ronin101

http://alces-flight.com/

Page 49: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

QUESTIONS?

Page 50: launch - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/AWS_Summit... · 2 2 2 4 2 1 1 3 7 7 4 9 5 7 6 6 7 7 4 8 4 s 8 2 1 9 5 4 5 3 1 2 3 6 1 9 4 8 1 2 8 7 7 6 Fixed Data

missing manual

Written by Amazon’s Research Computing community for

scientists.

• Explains foundational concepts about how AWS can

accelerate time-to-science in the cloud.

• Step-by-step best practices for securing your

environment to ensure your research data is safe and

your privacy is protected.

• Tools for budget management that will help you

control your spending and limit costs (and preventing

any over-runs).

• Catalogue of scientific solutions from partners chosen

for their outstanding work with scientists.

aws.amazon.com/rcp